Steerability of Language Models Can Be Predicted Early15. June 2026AI Models, Claude AIA trainable classifier predicts with a 0.7 Macro-F1-Score based on early hidden states whether activation steering will succeed without requiring complete generations. Share on: