Harness-1: Search Agent with Externalized State Management Trained via RL2. June 2026AI Models, Claude CodeA 20B search agent achieves 0.730 average curated recall across eight benchmarks by training RL on explicit state rather than integrating state management into the policy. Share on: