The gist: Qwen-AgentWorld trains language models on over 10 million interaction trajectories as an environment simulator to train AI agents through virtual environments and improve their performance across seven benchmarks.
Alibaba’s Qwen team has developed with Qwen-AgentWorld the first language models capable of simulating environment dynamics across seven different domains. The new framework trains large language models to predict and control agent environments.
Alibaba has released two new Language World Models: Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B. These models are specialized in predicting environment dynamics based on observations and actions. Training utilized over 10 million recorded interaction trajectories from seven different domains in real environments.
Development followed a three-stage pipeline: Continuous Pre-Training (CPT) injects general world-modeling capabilities through state transitions and expanded domain literature. Supervised Fine-Tuning (SFT) enables reasoning for state predictions. Reinforcement Learning (RL) sharpens simulation accuracy through a hybrid system of rule-based and rubric-based rewards. For evaluation, the authors presented AgentWorldBench, a benchmark derived from real interactions of five leading models on nine established agent benchmarks.
The models support two complementary paradigms: First, as a decoupled environment simulator, Qwen-AgentWorld enables scalable and controllable training of thousands of simulated environments for agent RL – with results exceeding pure real-world training. Second, as a unified agent foundation model, the world-modeling training serves as highly effective pre-warming for downstream agent benchmarks across all seven domains.
According to the manufacturer, the Qwen-AgentWorld models significantly outperform existing frontier models. Code and models are available at https://github.com/QwenLM/Qwen-AgentWorld.
Source: arxiv.org · Published June 22, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.