Skip to content

LLMs Learn Through Sleep: Self-Optimization and Knowledge Consolidation

The bottom line: A new training paradigm enables LLMs to autonomously integrate in-context knowledge into their parameters and continue developing without human supervision.

Researchers introduce a “sleep” mechanism that empowers language models to learn continuously and convert short-term insights into stable long-term capabilities. The system combines knowledge distillation with reinforcement learning for autonomous self-improvement.

The central challenge with existing large language models lies in their inability to learn continuously and permanently integrate temporary knowledge from context windows into their model weights. While current LLMs perform well on tasks with immediate feedback or in-context learning, they lack the ability to retain these insights longer and generalize them.

The presented “Sleep” paradigm consists of two phases: (1) Memory Consolidation occurs through “Knowledge Seeding” – a distillation process in which knowledge from a smaller model is transferred to a larger network to create capacity while preserving knowledge. The implementation combines on-policy distillation with RL-based imitation learning. (2) Dreaming denotes a self-improvement phase in which the model uses reinforcement learning to generate a curriculum from synthetic data in order to train new knowledge and refine existing capabilities – without external annotation.

Experiments demonstrate advantages in long-horizon sequence tasks, continuous learning, knowledge integration, and few-shot generalization. For CTOs, this represents a potential reduction in training cycles and retraining costs, as models can independently learn from their error patterns after deployment without requiring full retraining on original data.


Source: arxiv.org · Published June 1, 2026
Lumi AI News — AI-assisted curation per Art. 50 EU AI Act. Paraphrase and classification via Lumi News Pipeline v1.2.9.

Share on: