iLLaDA demonstrates that fully bidirectional diffusion training from scratch can be a competitive path to strong language models, even without autoregressive training.
Sumi is the first openly available Uniform-Diffusion language model trained from scratch at the 7-billion-parameter scale and addresses a research gap between established autoregressive and masked diffusion approaches.