The gist: Qwen-AgentWorld leverages language models as learned environment simulations to efficiently train autonomous agents and improve their reasoning through chain-of-thought prompting.
Alibaba has released two language models (35 billion and 397 billion parameters) that simulate environment dynamics, enabling agents to be trained across seven different domains—without real environment interaction.
Alibaba has introduced two foundation models for agent-based environment simulation: Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B. According to the researchers, these are the first language models capable of simulating agent-oriented environments across seven domains through extended chain-of-thought reasoning. Over 10 million interaction trajectories from real environments served as training data.
Development followed a three-stage training pipeline: In the CPT (Continual Pre-Training) stage, general world-modeling capabilities were injected from state transitions and expanded domain-specific corpora. The SFT phase (Supervised Fine-Tuning) activated the ability to predict the next state. In the RL phase (Reinforcement Learning), simulation accuracy was optimized through a framework combining hybrid rubric-based and rule-based rewards. In parallel, the team developed AgentWorldBench, a benchmark comprising real interactions from five frontier models across nine established benchmark sets.
The model operates in two complementary modes. First, it serves as a decoupled environment simulation to controllably simulate thousands of real environments for agent-based RL—yielding performance gains beyond pure environment training. Second, world-model training acts as an effective warm-up for a unified agent foundation model, improving downstream performance across seven agent-based benchmarks.
The code is available on GitHub (github.com/QwenLM/Qwen-AgentWorld). According to the report, the results demonstrate significant improvements over existing frontier models.
Source: arxiv.org · Published 22 June 2026
Lumi AI News — AI-assisted curation pursuant to Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.