AI Red Teaming

Gen AI have go through multiple levels of training:

Pre-trained on large and diverse datasets (books, websites, articles etc) in multiple languages
After training, safety and instructional post training to align model based on human feedback
Break-fix cycle with measurements + AI red teams to align model to RAI policies
Additional guardrails added to reduce harmful or inappropriate outputs

AI ingests tokenised text:

Fabrication

Alignment Gaps

Prompt injection

Confident but wrong

Leaned unintended behaviour

One input stream, no boundaries

Typical AI Application:

Last updated 2 days ago