iLLaDA: 8B Language Model Trained with Bidirectional Diffusion

26. June 2026
AI Models

iLLaDA demonstrates that fully bidirectional diffusion training from scratch can be a competitive path to strong language models, even without autoregressive training.

Share on:

NVIDIA Blackwell on Amazon SageMaker: Memory and Precision for Larger Models

25. June 2026
AI Models, Google

Blackwell’s 180–268 GB memory per GPU enables larger batch sizes and longer sequences during model training, reducing communication overhead and allowing single-node training for models that previously required multi-node setups.

Share on:

Structure-Aware Curriculum Learning for LLMs via Manifold Bandits

23. June 2026
AI Models, Claude AI

Structured curriculum learning strategies that leverage task relationships in latent space achieve better downstream performance than pure difficulty prioritization.

Share on:

Uniform FP4: New 4-Bit Training Method for LLMs Reduces Systematic Errors

19. June 2026
AI Models, Claude Code

Uniform 4-bit formats eliminate the systematic shrinkage bias of E2M1 in FP4 LLM training and enable consistently better convergence across all model sizes.

Share on:

STARE: Token-Level Stability Procedure Against Policy Entropy Collapse in GRPO Training

19. June 2026
AI Models, Claude AI

STARE uses surprisal metrics and selective advantage reweighting to maintain policy entropy stability across long training sequences while improving accuracy by 4–8%.

Share on:

Socratic-SWE: Self-Learning AI Agents for Code Repair

8. June 2026
AI Models, Claude Code

A self-learning framework for code-repair agents leverages their solution traces directly to generate targeted training tasks, achieving higher accuracy than previous approaches.

Share on:

OPRD: Representation Distillation with Hidden States Outperforms Output-Only Method

5. June 2026
AI Models, Claude Code

Hidden-state alignment reduces sampling variance, closes the student-teacher gap more effectively, and trains with less memory and computational time than output-only distillation.

Share on:

iLLaDA: 8B Language Model Trained with Bidirectional Diffusion

NVIDIA Blackwell on Amazon SageMaker: Memory and Precision for Larger Models

Structure-Aware Curriculum Learning for LLMs via Manifold Bandits

Uniform FP4: New 4-Bit Training Method for LLMs Reduces Systematic Errors

STARE: Token-Level Stability Procedure Against Policy Entropy Collapse in GRPO Training

Socratic-SWE: Self-Learning AI Agents for Code Repair

OPRD: Representation Distillation with Hidden States Outperforms Output-Only Method

Lumi AI News

Legal

Topics