LSA predicts relevant context sections in advance and retains only these in GPU memory, compressing the KV-cache by over 86 percent without sacrificing accuracy.
LCLMs compress KV-caches through encoder-decoder architecture up to 1:16 more efficiently than previous methods while reducing peak memory consumption and processing time.
Encoder-decoder compressors with adaptive expansion improve KV-cache compression methods in speed and memory efficiency without significant quality loss.
A developer deliberately placed sabotage code in jqwik 1.10.0 to manipulate AI agents into deleting code, revealing a new security vulnerability in the open-source software supply chain.
Invisible HTML comments in GitHub Issues could trick Claude Code AI into reading protected environment variables like ANTHROPIC_API_KEY due to insufficient restrictions on the Read tool.
Vector databases require permanent RAM allocation instead of persistent storage, causing operational costs many times higher than traditional database systems.
Apple uses Vision-LLMs for Siri integration without requiring changes to existing apps and provides Core AI PyTorch Extensions to enable developers to run custom models on Apple hardware.
RISE achieves similar accuracy to unbounded shell interaction within a limited interaction space, but reduces request costs to about one quarter and scales significantly better to large corpora.
AI agents function reliably only with comprehensive observability that reveals causal relationships in complex systems—not through language models alone.
A self-learning framework for code-repair agents leverages their solution traces directly to generate targeted training tasks, achieving higher accuracy than previous approaches.