GauntletBench: New Benchmark Reveals Limitations of AI Agents

Current AI agents fail at complex visual tasks in professional applications far more frequently than previous benchmarks suggest.

Share on: