AWS has developed P-EAGLE, a parallelized variant of speculative decoding that generates draft tokens in a single forward pass instead of sequentially, achieving inference throughput improvements of up to 1.69x on SageMaker AI.
Post-training migrates from monolithic RL pipelines to decentralized specialist systems merged through on-policy distillation into a generalist student—a scaling pattern that resolves capability conflicts across domains.
AI realizes its full potential in product development only when it accesses product data systematically across the entire lifecycle—not as an isolated tool, but as an integrated component of a continuous lifecycle platform.
Tangram achieves statically predictable memory budgets per attention head to eliminate fragmentation and latency drag caused by dynamic KV-cache compression.
Dedicated exploration models (4B–30B parameters) can handle code search in repositories more efficiently than general solver models while significantly reducing context pollution.
Poisoned documents can turn reasoning-based AI guardrails into DoS weapons by leveraging security systems themselves as resource sinks—a new attack vector with concentration risks in shared governance infrastructure.
Attackers can exploit reasoning guardrails of AI agents through deliberately manipulated inputs to cause resource exhaustion without bypassing the security mechanisms themselves.
HarnessX automates the assembly and adaptation of agent harnesses from execution traces, achieving an average +14.5% performance improvement without model scaling.
A new benchmark enables identification of the exact point where medical AI models produce hallucinations and enables targeted countermeasures through trace-supervised fine-tuning.