OpenBioRQ: Benchmark for Agent-Based Biomedical Research Questions

26. June 2026
AI Models, Claude AI, Claude Code

OpenBioRQ reveals that agent-based AI models fail on approximately 40% of complex biomedical research questions and paradoxically stop using their tools on difficult tasks, despite these tools being most critical.

Share on:

ViQ: Discrete Visual Representations at Arbitrary Resolution

26. June 2026
AI Models, Claude Code

ViQ quantizes visual inputs at arbitrary resolutions into discrete representations, achieving 20–70% training acceleration compared to continuous image encodings.

Share on:

Language Compression in LLMs: Output Optimization Saves Costs, Input Reduction Increases Them

26. June 2026
AI Models, Claude Code

Output compression effectively reduces inference costs, while input compression increases overall costs and degrades response quality.

Share on:

Tool-Calling Failures Under Schema Constraints in Open-Weight LLMs

26. June 2026
AI Models, Claude Code

JSON schema constraints compile tool-call tokens into unreachable regions of token space, causing models to suppress function calls despite both functions working in isolation.

Share on:

Agentic Overlays: Transforming REST-APIs in Agent-to-Agent Communication

25. June 2026
Claude AI, Claude Code

Agentic Overlays are thin wrapper layers that convert REST-APIs into A2A-capable agents without code duplication, eliminating the need for parallel infrastructures.

Share on:

GitHub Blocks Automatic Code Checkouts from Forked Pull Requests

25. June 2026
Claude Code, Cybersecurity

GitHub blocks by default the automatic loading of code from forked pull requests in privileged workflows to prevent attackers from stealing GITHUB_TOKEN and environment variables.

Share on:

CI/CD Vulnerability Cordyceps Threatens GitHub Repositories via Supply-Chain Attacks

24. June 2026
Claude Code, Cybersecurity

A critical CI/CD vulnerability called Cordyceps enables attackers to gain full control over repositories and compromise the supply chain of hundreds of open-source projects.

Share on:

Claude Tag: Anthropic Brings Asynchronous Slack-Native Agents to Teams

24. June 2026
Claude AI, Claude Code, Claude Cowork

Claude Tag extends Claude from single-user chat to a proactive, multiplayer Slack-native force that asynchronously coordinates tasks and acts autonomously across channel boundaries.

Share on:

EDV Framework Reduces Error Accumulation in Self-Learning LLM Agents

24. June 2026
AI Models, Claude Code

EDV uses multiple heterogeneous agents to generate diverse solution approaches, an independent verifier, and a consensus mechanism to filter out erroneous experiences before they are stored.

Share on:

NatureBench: How Far Coding Agents Really Get on Scientific Tasks

24. June 2026
AI Models, Claude AI, Claude Code

AI agents exceed baseline on only roughly 18 percent of genuine scientific tasks because they tend to reframe problems rather than solve them with true innovation.

Share on:

ParallelKernelBench: Frontier LLMs Still Struggling with Fast Multi-GPU Kernels

23. June 2026
AI Models, Claude Code, OpenAI

Frontier LLMs solve fewer than one-third of 87 multi-GPU CUDA benchmark tasks, though some generated kernels still outperform public reference implementations.

Share on:

GitHub Restricts actions/checkout Against Pwn Request Attacks

23. June 2026
Claude Code, Cybersecurity

GitHub restricts actions/checkout to prevent attackers from executing code with full workflow privileges via pull_request_target trigger.

Share on:

« Previous
1
2
3
4
…
17
Next »

OpenBioRQ: Benchmark for Agent-Based Biomedical Research Questions

ViQ: Discrete Visual Representations at Arbitrary Resolution

Language Compression in LLMs: Output Optimization Saves Costs, Input Reduction Increases Them

Tool-Calling Failures Under Schema Constraints in Open-Weight LLMs

Agentic Overlays: Transforming REST-APIs in Agent-to-Agent Communication

GitHub Blocks Automatic Code Checkouts from Forked Pull Requests

CI/CD Vulnerability Cordyceps Threatens GitHub Repositories via Supply-Chain Attacks

Claude Tag: Anthropic Brings Asynchronous Slack-Native Agents to Teams

EDV Framework Reduces Error Accumulation in Self-Learning LLM Agents

NatureBench: How Far Coding Agents Really Get on Scientific Tasks

ParallelKernelBench: Frontier LLMs Still Struggling with Fast Multi-GPU Kernels

GitHub Restricts actions/checkout Against Pwn Request Attacks

Lumi AI News

Legal

Topics