Skip to content

Harness-1: Search Agent with Externalized State Management Trained via RL

The point: A 20B search agent achieves 0.730 average curated recall across eight benchmarks by training RL on explicit state rather than integrating state management into the policy.

Anthropic and Princeton University present Harness-1, a 20-billion-parameter search agent that externalizes state management from the policy network into an external harness. The approach enables more efficient reinforcement learning and better generalization to new domains.

Harness-1 is based on the observation that traditional search agents are trained as policies over growing transcripts: the model must both make semantic search decisions and perform bookkeeping – tracking observed evidence, assessing relevance of findings, maintaining open constraints, and verifying claims. This formulation forces reinforcement learning to optimize both tasks simultaneously.

The harness architecture shifts state management to an external, environment-side component. This manages a candidate pool, an importance-tagged curated set, compact evidence links, verification protocols, and deduplicated and compressed observations with budget-aware context rendering. The policy focuses solely on semantic decisions: what to search for, which documents to retain or discard, what to verify, and when to stop.

In evaluations across eight retrieval benchmarks (web, finance, patents, multi-hop QA), Harness-1 achieves an average curated recall of 0.730 and outperforms the next-strongest open search subagent by 11.4 points. The model remains competitive with significantly larger frontier models. Gains are particularly pronounced on held-out transfer benchmarks outside the training domains, suggesting that RL over explicit search state produces generalizable retrieval behaviors.

Code is available at https://github.com/pat-jj/harness-1. For CTOs: The design decouples inference overhead through external state management, thereby reducing memory and compute pressure on the neural network and enabling scalably trained RL agents with better domain generalization – a pattern transferable to other agent-based systems beyond search.


Source: arxiv.org · Published May 31, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.2.9.

Share on: