Adversarial Hacker-Fixer Loops Close Security Gaps in Agent Benchmarks10. June 2026AI Models, Claude CodeAn automated system of competing AI agents iteratively finds and closes exploits in agent benchmarks without requiring manual per-task patches. Share on:
CHERRL: Controlled Analysis of Reward Hacking in LLM-Based Reinforcement Learning Systems4. June 2026AI Models, Claude Code, CybersecurityCHERRL enables reproducible analysis of reward hacking mechanisms through controlled bias injection and automatic detection of exploitation onset in LLM-based training. Share on: