Reasoning Arena: Anthropic Uses Pairwise Comparisons Instead of Verification for LLM Training10. June 2026AI Models, Claude AIReasoning Arena replaces uninformative rewards with head-to-head comparisons of solution attempts and reduces required compute time by 27 to 41 percent. Share on: