Skip to content

StreamMA: Streaming Protocol Reduces Latency in Multi-Agent Reasoning Systems

In brief: Streaming-based multi-agent reasoning reduces latency through pipelining while simultaneously improving accuracy because early, more reliable reasoning steps protect against erroneous later steps.

A new system called StreamMA transfers reasoning steps between agents in real time rather than sequentially, which lowers latency proportionally to pipeline depth and unexpectedly also improves accuracy. The principle: early reasoning steps are more reliable than late ones, so early transfers prevent erroneous later steps from misleading downstream agents.

StreamMA changes the classic paradigm of multi-agent-based reasoning systems, which until now have operated according to a “generate-then-transfer” pattern: one agent completes its entire reasoning process before passing the result to the next agent. This leads to linear scaling of end-to-end latency with pipeline depth.

The new streaming protocol transfers each reasoning step immediately after generation to downstream agents, allowing adjacent agents to be parallelized. The researchers analyze theoretically and empirically that multi-step reasoning does not have uniform quality: early steps are typically more reliable than late ones. This allows downstream agents to avoid being influenced by erroneous late steps when they instead work with the more trustworthy early results.

StreamMA was evaluated on eight reasoning benchmarks (mathematics, science, code generation) using two state-of-the-art models (Claude Opus 4.6, GPT-5.4) across three topologies (chain, tree, graph). On average, StreamMA showed a 7.3 percentage point improvement over baselines, with a maximum of 22.4 percentage points on HMMT 2026 with Claude Opus 4.6-high.

Additionally, the authors discovered a “step-level scaling law”: increasing reasoning steps per agent continuously improves both effectiveness and efficiency. This new scaling dimension works orthogonally to and is combinable with classic agent-count scaling approaches.


Source: arxiv.org · Published June 2, 2026
Lumi AI News — AI-assisted curation in accordance with Article 50 of the EU AI Act. Paraphrasing and classification by Lumi News Pipeline v1.2.9.

Share on: