In Brief: A new multi-agent harness architecture with planner, generator, and evaluator enables Claude to autonomously develop full-stack applications over hours. Explicit context resets and structured handoffs between agent sessions are key to success.

Prithvi Rajasekaran from the Anthropic Labs team presents an innovative multi-agent harness architecture that empowers Claude to autonomously develop complete full-stack applications over multiple hours. By combining generator and evaluator agents with structured context management, previous performance limits are overcome.

Developing a robust harness architecture for long-term AI coding tasks requires carefully considered solution approaches. While earlier attempts to improve Claude through prompt engineering and simple harness design delivered respectable results, they quickly reached their limits.

Rajasekaran identified two core problems: In complex, longer-term tasks, models lose coherence as the context window fills. Additionally, Claude Sonnet 3.5 exhibits “context anxiety”—the model begins to prematurely conclude work as it approaches the context limit.

The solution lies in explicit context resets rather than mere compression. While compression summarizes earlier conversation segments, the model retains its internal tension. A reset offers a clean slate, but requires a structured handoff artifact containing sufficient state for the next agent.

The resulting three-agent architecture comprises a planner, a generator, and an evaluator. The evaluator is trained as in Generative Adversarial Networks (GANs)—with objective criteria that translate subjective quality judgments into measurable terms. These approaches enable Claude to develop complex full-stack applications with verifiable correctness, where human intervention was previously necessary.

Source: www.anthropic.com

Share on:

Multi-Agent Architecture for Long-Term Application Development

Lumi AI News

Legal

Topics