OpenBioRQ reveals that agent-based AI models fail on approximately 40% of complex biomedical research questions and paradoxically stop using their tools on difficult tasks, despite these tools being most critical.
OpenAI calls for mandatory federal evaluations before AI model release but rejects regulatory approvals, positioning itself in a controlled middle ground between voluntary commitments and strict government control.
Trump strikes a compromise between AI innovation and cybersecurity by establishing voluntary national security reviews for advanced AI models without imposing licensing or pre-approval requirements.
Current frontier models achieve less than 50 percent success rate on the new ITBench-AA benchmark for evaluating agentic IT capabilities, revealing a significant gap between model capabilities and production readiness for autonomous IT tasks.