| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Pruning Analysis: Experiments showing the impact of pruning low-entropy vs. high-entropy steps on DeepSeek-R1-Distill-Qwen-7B. | ||||
| GSM8K | Accuracy | 79.2 | 79.2 | 0.0 |
| GSM8K | Accuracy | 79.2 | 20.0 | -59.2 |
| GSM8K | Accuracy | 79.2 | 55.0 | -24.2 |
| Training Results: Performance of the model trained (SFT+GRPO) to autonomously compress CoT. | ||||
| GSM8K | Token Count | 709 | 395 | -314 |
| MATH | Token Count | 996 | 428 | -568 |