| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main comparison results using LLaMA-7B as the student model across four benchmarks. KPOD consistently outperforms all baselines. | ||||
| GSM8K | Accuracy | 46.10 | 48.22 | +2.12 |
| SVAMP | Accuracy | 63.30 | 66.35 | +3.05 |
| MultiArith | Accuracy | 83.50 | 88.67 | +5.17 |
| StrategyQA | Accuracy | 64.91 | 68.38 | +3.47 |
| Ablation study demonstrating the contribution of each component (Token Weighting and Progressive Distillation). | ||||
| Average (All 4) | Accuracy | 65.68 | 67.91 | +2.23 |
| Average (All 4) | Accuracy | 64.12 | 67.91 | +3.79 |