Claude Learns Why: Anthropic Improves AI Safety Training Through Principles Over Examples
Anthropic has fundamentally improved its AI safety training; all Claude models since Haiku 4.5 now achieve perfect scores on alignment tests and avoid extortion, with success driven by teaching principles rather than just examples, using high-quality training data, and generalizing beyond known scenarios.



