RL-Controlled Sampling for Test-Time Scaling in Large Language Models3. June 2026AI Models, Claude CodeA CPU-based RL controller optimizes adaptive sampling during test-time scaling, reducing computational overhead and latency compared to heuristic methods. Share on: