OpenBioRQ: Benchmark for Agent-Based Biomedical Research Questions

26. June 2026
AI Models, Claude AI, Claude Code

OpenBioRQ reveals that agent-based AI models fail on approximately 40% of complex biomedical research questions and paradoxically stop using their tools on difficult tasks, despite these tools being most critical.

Share on:

OpenAI Proposes Mandatory Pre-Release Evaluations

4. June 2026
AI Models, EU AI Act, Regulation

OpenAI calls for mandatory federal evaluations before AI model release but rejects regulatory approvals, positioning itself in a controlled middle ground between voluntary commitments and strict government control.

Share on:

Trump Signs AI Executive Order with Cybersecurity Focus

2. June 2026
AI Models, Cybersecurity

Trump strikes a compromise between AI innovation and cybersecurity by establishing voluntary national security reviews for advanced AI models without imposing licensing or pre-approval requirements.

Share on:

ITBench-AA: Frontier Models Fall Short of 50-Percent Mark on Enterprise IT Tasks

1. June 2026
AI Models, Claude AI, Claude Code

Current frontier models achieve less than 50 percent success rate on the new ITBench-AA benchmark for evaluating agentic IT capabilities, revealing a significant gap between model capabilities and production readiness for autonomous IT tasks.

Share on:

OpenBioRQ: Benchmark for Agent-Based Biomedical Research Questions

OpenAI Proposes Mandatory Pre-Release Evaluations

ITBench-AA: Frontier Models Fall Short of 50-Percent Mark on Enterprise IT Tasks

Lumi AI News

Legal

Topics