LLM agents can commit early to an incorrect interpretation without final answer correctness revealing this — hidden-state convergence enables early detection of this failure mode.
Hidden-state alignment reduces sampling variance, closes the student-teacher gap more effectively, and trains with less memory and computational time than output-only distillation.