AI agents can be trained as data scientists to automatically generate high-quality synthetic training data, which continuously improves through meta-optimization.
Arbor coordinates autonomous AI agents via persistent hypothesis trees and achieved 2.5× better results than Codex and Claude Code on six research tasks.
RL environments with software bugs (stale cache, reward hacks, false state transitions) generate toxic training data that sabotage agent training – systematic quality validation is necessary.