Flawed-CoT: Teaching Language Models to Recover from Flawed Mathematical Reasoning
We present Mixed-CoT, a training methodology that teaches language models to identify and recover from reasoning errors using error-injected chain-of-thought with GRPO optimization. Achieves 24% accuracy on perturbed problems vs 19% for standard RL, while maintaining 41% accuracy on clean problems.