STARE: Token-Level Stability Procedure Against Policy Entropy Collapse in GRPO Training19. June 2026AI Models, Claude AISTARE uses surprisal metrics and selective advantage reweighting to maintain policy entropy stability across long training sequences while improving accuracy by 4–8%. Share on: