| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results comparing ScRPO to baselines on the 1.5B model size. | ||||
| Average (5 datasets) | Accuracy | 62.5 | 64.8 | +2.3 |
| AIME-2024 | Accuracy | 46.7 | 49.0 | +2.3 |
| Main results comparing ScRPO to baselines on the 7B model size. | ||||
| Average (5 datasets) | Accuracy | 76.4 | 77.8 | +1.4 |
| AIME-2024 | Accuracy | 64.0 | 66.7 | +2.7 |
| Ablation studies validating the contributions of specific components. | ||||
| Average | Accuracy | 63.7 | 64.8 | +1.1 |
| Average | Accuracy | 63.4 | 64.8 | +1.4 |