CHERRL enables reproducible analysis of reward hacking mechanisms through controlled bias injection and automatic detection of exploitation onset in LLM-based training.
Anthropic expands Mythos access to 150 new organizations; security experts warn of structural changes driven by frontier AI models and the risk of vulnerability chaining.
ThoughtFold identifies and removes redundant exploration steps in reasoning chains, reducing token consumption by 56% for DeepSeek-R1-Distill-Qwen-7B while maintaining state-of-the-art accuracy.
Long-horizon iterative improvement, not single high-quality responses, is the critical capability for autonomous AI agents tackling real-world engineering tasks.
STRIDE formalizes training data attribution as a sparse recovery problem in activation space, achieving an order of magnitude faster results than gradient-based methods.
Streaming-based multi-agent reasoning reduces latency through pipelining while simultaneously improving accuracy because early, more reliable reasoning steps protect against erroneous later steps.
GRAIL uses gradient activation saliency to train relevant reasoning steps more strongly than irrelevant tokens, achieving 3.60% accuracy improvement without separate process-level supervision.
KVarN reduces error accumulation when quantizing KV-caches to 2-bit precision through improved token-scale normalization and achieves state-of-the-art results on MATH500, AIME24, and HumanEval.