The Point: A 3-billion-parameter model achieves performance on mathematical and code benchmarks (AIME26: 94.3; LiveCodeBench v6: 80.2) that competes with systems that are a hundredfold larger.

Researchers have demonstrated with VibeThinker-3B that language models with only 3 billion parameters can compete with significantly larger models on formal reasoning tasks. The result challenges established assumptions about the model sizes necessary for high-quality logical inference.

VibeThinker-3B is a compact language model with 3 billion parameters developed on the basis of the Spectrum-to-Signal post-training paradigm. The architecture combines curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation in an optimized pipeline to improve reasoning capabilities.

On standardized benchmarks, the model achieves remarkable results: 94.3 points on AIME26 (a demanding mathematics problem set, improvable to 97.1 with test-time scaling), 80.2 Pass@1 on LiveCodeBench v6 (code generation), and 96.1 percent acceptance rate on current, previously unseen LeetCode competitions. The IFEval score of 93.4 demonstrates that this reasoning performance does not come at the expense of instruction following. These results match or exceed flagship systems such as DeepSeek V3.2, GLM-5, and Gemini 3 Pro, which are thousands of times larger.

Share on:

VibeThinker-3B: Verifiable Reasoning in Compact Language Models

Lumi AI News

Legal

Topics