Bebop: Rejection Sampling Improves Multi-Token Prediction in RL Training11. June 2026AI Models, Claude CodeBebop uses rejection sampling and TV loss optimization to maintain stable MTP acceptance rates during RL training and accelerates rollouts by up to 1.8x. Share on: