Skip to content

Sumi: Uniform-Diffusion Language Model with 7 Billion Parameters Trained from Scratch

At a glance: Sumi is the first openly available Uniform-Diffusion language model trained from scratch at the 7-billion-parameter scale and addresses a research gap between established autoregressive and masked diffusion approaches.

Researchers have trained Sumi, a fully open language model based on Uniform Diffusion, with 7 billion parameters from scratch on 1.5 trillion tokens. The model thus provides a reference implementation for research into a previously underexplored alternative to autoregressive and masked diffusion models.

Diffusion models have established themselves as a promising alternative to autoregressive architectures. Uniform Diffusion Language Models (UDLMs) enable every token to be updated at every generation step – theoretically enabling more flexible generation strategies. However, no UDLM has been pretrained from scratch at a larger parameter scale with corresponding training budget. For autoregressive models and masked diffusion models, scalable reference implementations already exist that drive research; for Uniform Diffusion, this anchor point was missing.

Sumi (“ink” in Japanese) closes this gap. The 7-billion-parameter model was trained on 1.5 trillion tokens from publicly available corpora. When compared with autoregressive models of similar training budgets, Sumi achieves comparable performance on knowledge, reasoning, and code benchmarks but falls short on common-sense tasks – which reflects the academic-heavy composition of the training data.

The developers have released model weights, checkpoints, and the complete training recipe as well as the specification of the data mixture as open source. This is intended to enable the community to investigate Uniform Diffusion at scale and advance research into the previously poorly understood aspects of this model class.


Source: arxiv.org · Published June 16, 2026
Lumi AI News — AI-assisted curation in accordance with Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.

Share on: