AI Red Teaming

Gen AI have go through multiple levels of training:

  1. Pre-trained on large and diverse datasets (books, websites, articles etc) in multiple languages

  2. After training, safety and instructional post training to align model based on human feedback

  3. Break-fix cycle with measurements + AI red teams to align model to RAI policies

  4. Additional guardrails added to reduce harmful or inappropriate outputs

AI ingests tokenised text:

Fabrication
Alignment Gaps
Prompt injection

Confident but wrong

Leaned unintended behaviour

One input stream, no boundaries

Typical AI Application:

Last updated