ProCUA-SFT: Automatically Generated Training Data for Desktop Agents

17. June 2026
AI Models, Claude Code

Automatically synthesized training data improves desktop agents by 18.7 percentage points compared to previous approaches.

Share on:

ZPPO: Teacher Models as Prompts Instead of Gradients

17. June 2026
AI Models, Claude AI

ZPPO integrates teacher models as prompt components instead of gradients, improving generalization in knowledge transfer to smaller models.

Share on:

Amazon Bedrock: InvokeGuardrailChecks API for Agent-Based Applications

17. June 2026
AI Models, Claude Code

The new API enables granular application of safeguards at every point in multi-turn agent loops and allows defining custom thresholds and actions (block, bypass, retry) based on numerical scores.

Share on:

P-EAGLE: Parallel Speculation for Faster LLM Inference on AWS SageMaker

16. June 2026
AI Models, Claude Code

AWS has developed P-EAGLE, a parallelized variant of speculative decoding that generates draft tokens in a single forward pass instead of sequentially, achieving inference throughput improvements of up to 1.69x on SageMaker AI.

Share on:

Tangram: Static KV-Cache Compression for Faster Multi-Turn LLM Serving

16. June 2026
AI Models, Claude Code

Tangram achieves statically predictable memory budgets per attention head to eliminate fragmentation and latency drag caused by dynamic KV-cache compression.

Share on:

FastContext: Specialized Agents for Efficient Code Repository Exploration

16. June 2026
AI Models, Claude Code

Dedicated exploration models (4B–30B parameters) can handle code search in repositories more efficiently than general solver models while significantly reducing context pollution.

Share on:

HarnessX: Automated Optimization of Agent Runtime Environments

15. June 2026
AI Models, Claude AI, Claude Code

HarnessX automates the assembly and adaptation of agent harnesses from execution traces, achieving an average +14.5% performance improvement without model scaling.

Share on:

Agent-EvalKit: Open-Source Evaluation for AI Agents in Claude Code

11. June 2026
AI Models, Claude AI, Claude Code

Agent-EvalKit automates the evaluation of AI agents through structured test-case generation, observability instrumentation, and combined code and LLM-based metrics directly in the development environment.

Share on:

Mixture-of-Experts Router Optimized via Manifold Power Iteration

11. June 2026
AI Models, Claude Code

Aligning router rows with the principal singular directions of their associated expert matrices improves the efficiency and stability of Mixture-of-Experts models.

Share on:

Claw-SWE-Bench: Benchmark for AI Agents on Code Tasks

11. June 2026
AI Models, Claude Code

The Claw-SWE-Bench framework demonstrates that adapter design is critical for code agents: with a minimal adapter, OpenClaw achieves 19.1% Pass@1, with a complete adapter 73.4%.

Share on:

ICALens: Interpretability Method for Language Models Without Training Additional Autoencoders

11. June 2026
AI Models, Claude AI

ICA-based analysis enables rapid exploration of interpretable directions in language models without expensive training of additional autoencoders.

Share on:

Google DeepMind DiffusionGemma: Parallel Text Generation on Local GPUs

10. June 2026
AI Models, Google

DiffusionGemma denoises up to 256 tokens in parallel per step instead of sequentially and achieves 1,000 tokens/second on NVIDIA H100 at batch size 1 — without cloud dependency.

Share on:

« Previous
1
2
3
4
5
…
37
Next »

ProCUA-SFT: Automatically Generated Training Data for Desktop Agents

ZPPO: Teacher Models as Prompts Instead of Gradients

Amazon Bedrock: InvokeGuardrailChecks API for Agent-Based Applications

P-EAGLE: Parallel Speculation for Faster LLM Inference on AWS SageMaker

Tangram: Static KV-Cache Compression for Faster Multi-Turn LLM Serving

FastContext: Specialized Agents for Efficient Code Repository Exploration

HarnessX: Automated Optimization of Agent Runtime Environments

Agent-EvalKit: Open-Source Evaluation for AI Agents in Claude Code

Mixture-of-Experts Router Optimized via Manifold Power Iteration

Claw-SWE-Bench: Benchmark for AI Agents on Code Tasks

ICALens: Interpretability Method for Language Models Without Training Additional Autoencoders

Google DeepMind DiffusionGemma: Parallel Text Generation on Local GPUs

Lumi AI News

Legal

Topics