Claude Opus 4.7 performs complex robotics tasks without human assistance 37 times faster than human teams from a year earlier and writes code that works correctly on the first attempt in most cases.
GLM-5.2 ranks as the leading open language model on the Artificial Analysis Index with a score of 51 and places 2nd in the Code Arena WebDev Leaderboard, but produces significantly more output tokens than competing models.
Aligning router rows with the principal singular directions of their associated expert matrices improves the efficiency and stability of Mixture-of-Experts models.
DiffusionGemma denoises up to 256 tokens in parallel per step instead of sequentially and achieves 1,000 tokens/second on NVIDIA H100 at batch size 1 — without cloud dependency.
Optical reasoning uses images as the primary reasoning medium, saving an average of 28.57 percent tokens on language tasks and 16 percent on multimodal tasks.
Fable 5 sets new benchmarks in software engineering and knowledge work through extended autonomous runtimes, while Mythos 5 offers cybersecurity capabilities without security restrictions.
Microsoft has introduced MAI-Thinking-1, its first reasoning model with fine-tuning capability for enterprise, specifically designed for domain-specific customizations.
A new training paradigm enables LLMs to autonomously integrate in-context knowledge into their parameters and continue developing without human supervision.
A 20B search agent achieves 0.730 average curated recall across eight benchmarks by training RL on explicit state rather than integrating state management into the policy.