| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on the challenging MATH benchmark showing rStar-Math's improvement over base models and comparison to proprietary SOTA. | ||||
| MATH | Accuracy (Pass@1) | 58.8 | 90.0 | +31.2 |
| MATH | Accuracy (Pass@1) | 51.2 | 87.8 | +36.6 |
| Results on Olympiad-level AIME 2024 showing capability on very hard problems. | ||||
| AIME 2024 | Accuracy | 46.7 | 53.3 | +6.6 |
| Ablation study demonstrating the superiority of the Process Preference Model (PPM) over other reward modeling approaches. | ||||
| MATH | Accuracy | 84.2 | 86.6 | +2.4 |