AI Red Teaming
Gen AI have go through multiple levels of training:
Pre-trained on large and diverse datasets (books, websites, articles etc) in multiple languages
After training, safety and instructional post training to align model based on human feedback
Break-fix cycle with measurements + AI red teams to align model to RAI policies
Additional guardrails added to reduce harmful or inappropriate outputs
AI ingests tokenised text:


Fabrication
Alignment Gaps
Prompt injection
Confident but wrong
Leaned unintended behaviour
One input stream, no boundaries
Typical AI Application:

Last updated