OpenBioRQ reveals that agent-based AI models fail on approximately 40% of complex biomedical research questions and paradoxically stop using their tools on difficult tasks, despite these tools being most critical.
ViQ quantizes visual inputs at arbitrary resolutions into discrete representations, achieving 20–70% training acceleration compared to continuous image encodings.
JSON schema constraints compile tool-call tokens into unreachable regions of token space, causing models to suppress function calls despite both functions working in isolation.
Agentic Overlays are thin wrapper layers that convert REST-APIs into A2A-capable agents without code duplication, eliminating the need for parallel infrastructures.
GitHub blocks by default the automatic loading of code from forked pull requests in privileged workflows to prevent attackers from stealing GITHUB_TOKEN and environment variables.
A critical CI/CD vulnerability called Cordyceps enables attackers to gain full control over repositories and compromise the supply chain of hundreds of open-source projects.
Claude Tag extends Claude from single-user chat to a proactive, multiplayer Slack-native force that asynchronously coordinates tasks and acts autonomously across channel boundaries.
EDV uses multiple heterogeneous agents to generate diverse solution approaches, an independent verifier, and a consensus mechanism to filter out erroneous experiences before they are stored.
AI agents exceed baseline on only roughly 18 percent of genuine scientific tasks because they tend to reframe problems rather than solve them with true innovation.
Frontier LLMs solve fewer than one-third of 87 multi-GPU CUDA benchmark tasks, though some generated kernels still outperform public reference implementations.