| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on MATH and GSM8K showing Step-DPO improvements over base models and vanilla DPO. | ||||
| MATH | Accuracy | 67.9 | 70.8 | +2.9 |
| GSM8K | Accuracy | 91.1 | 94.0 | +2.9 |
| MATH | Accuracy | 47.2 | 58.6 | +11.4 |
| MATH | Accuracy | 52.8 | 56.0 | +3.2 |
| Ablation study on data source distribution (In-Distribution vs. Out-of-Distribution). | ||||
| MATH | Accuracy | 50.1 | 53.0 | +2.9 |
| MATH | Accuracy | 50.8 | 53.0 | +2.2 |