InfoKV combines attention scores with uncertainty signals for KV-cache compression, outperforming pure attention-based methods on long reasoning tasks by measurable margins.
SEVRA saves 26–91 percent tokens during inference through selective verification without compromising accuracy, but presents longer initial solution attempts as partially more cost-effective.
A new benchmark enables identification of the exact point where medical AI models produce hallucinations and enables targeted countermeasures through trace-supervised fine-tuning.
Microsoft has introduced MAI-Thinking-1, its first reasoning model with fine-tuning capability for enterprise, specifically designed for domain-specific customizations.