A CPU-based RL controller optimizes adaptive sampling during test-time scaling, reducing computational overhead and latency compared to heuristic methods.
VaSE achieves higher accuracy than existing sparse-attention methods at 4x KV-cache compression, thereby reducing the memory bottleneck of reasoning models.
NVIDIA’s OmniDreams generates complex vehicle simulations in real time, generalizes better to rare scenarios, and can serve as a foundation for more efficient driving policy models.
Successful domain specialization of LLMs requires careful tuning of learning rate, data-mixing ratios, and checkpoint selection to avoid catastrophic forgetting.
GitHub is adapting its infrastructure and workflows to AI agents that increased code volume by 1,400 percent in 2026 by integrating AI into existing systems like CI/CD, PR review, and open-source collaboration.
A 20B search agent achieves 0.730 average curated recall across eight benchmarks by training RL on explicit state rather than integrating state management into the policy.
PaW trains environment models during policy training using the same RL rollouts, consistently improving agent performance without requiring additional simulators or inference costs.
Edamame introduces host-based runtime verification to detect code drift and misuse of autonomous AI coding agents before confidential data is exfiltrated.
Anthropic is expanding its AI-powered code security program to 150 new partners from critical infrastructure sectors, as the initial 50 partners have already identified over 10,000 critical vulnerabilities.