The new API enables granular application of safeguards at every point in multi-turn agent loops and allows defining custom thresholds and actions (block, bypass, retry) based on numerical scores.
AWS has developed P-EAGLE, a parallelized variant of speculative decoding that generates draft tokens in a single forward pass instead of sequentially, achieving inference throughput improvements of up to 1.69x on SageMaker AI.
Tangram achieves statically predictable memory budgets per attention head to eliminate fragmentation and latency drag caused by dynamic KV-cache compression.
Dedicated exploration models (4B–30B parameters) can handle code search in repositories more efficiently than general solver models while significantly reducing context pollution.
HarnessX automates the assembly and adaptation of agent harnesses from execution traces, achieving an average +14.5% performance improvement without model scaling.
Agent-EvalKit automates the evaluation of AI agents through structured test-case generation, observability instrumentation, and combined code and LLM-based metrics directly in the development environment.
Aligning router rows with the principal singular directions of their associated expert matrices improves the efficiency and stability of Mixture-of-Experts models.
The Claw-SWE-Bench framework demonstrates that adapter design is critical for code agents: with a minimal adapter, OpenClaw achieves 19.1% Pass@1, with a complete adapter 73.4%.
DiffusionGemma denoises up to 256 tokens in parallel per step instead of sequentially and achieves 1,000 tokens/second on NVIDIA H100 at batch size 1 — without cloud dependency.