RL environments with software bugs (stale cache, reward hacks, false state transitions) generate toxic training data that sabotage agent training – systematic quality validation is necessary.
STRIDE formalizes training data attribution as a sparse recovery problem in activation space, achieving an order of magnitude faster results than gradient-based methods.