Infrastructure resource configuration can shift agentic coding benchmark scores by up to 6 percentage points, with tests showing that error rates decline when more resource headroom is available, raising questions about the validity of model comparisons on such benchmarks.