ThoughtFold identifies and removes redundant exploration steps in reasoning chains, reducing token consumption by 56% for DeepSeek-R1-Distill-Qwen-7B while maintaining state-of-the-art accuracy.
VaSE achieves higher accuracy than existing sparse-attention methods at 4x KV-cache compression, thereby reducing the memory bottleneck of reasoning models.