Bottom line: A systematic data curation pipeline enables agentic models to be trained generalizably across diverse task types while achieving competitive or superior results compared to specialized models.

The OpenThoughts-Agent project releases a fully open data curation pipeline for training agentic language models. A Qwen3-32B model achieves 44.8% average accuracy across seven agentic benchmarks, outperforming existing open approaches.

The research group behind OpenThoughts-Agent has developed an open data curation pipeline that strategically assembles training data for agentic language models. The work addresses an existing gap: while projects such as SWE-Smith, SERA, and Nemotron-Terminal are typically specialized for individual benchmarks, a methodology for training models that generalize across diverse agentic tasks has been lacking.

The research team conducted over 100 controlled ablation studies to systematically examine each stage of the pipeline. In doing so, they identified the importance of task sources and their diversity. The final training dataset consists of 100,000 examples. A Qwen3-32B model fine-tuned on this dataset achieved an average accuracy of 44.8% across seven agentic benchmarks—an advantage of 3.9 percentage points over Nemotron-Terminal-32B (40.9%), the strongest existing open agentic model.

The trained models demonstrate strong scaling properties: they outperform alternative open datasets at every training size level in compute-steered comparisons. The project publishes the training datasets, data pipeline, experimental data, and models on openthoughts.ai to support future open research in this field.

Source: arxiv.org · Published June 22, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.

Share on:

OpenThoughts-Agent: Systematic Data Curation for Agentic Models

Lumi AI News

Legal

Topics