NatureBench: How Far Coding Agents Really Get on Scientific Tasks24. June 2026AI Models, Claude AI, Claude CodeAI agents exceed baseline on only roughly 18 percent of genuine scientific tasks because they tend to reframe problems rather than solve them with true innovation. Share on: