| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| MATH (representative subset) | % Solved (Best-of-N) | 72.4 | 78.2 | +5.8 |
| MATH (representative subset) | % Solved (Best-of-N) | 69.6 | 78.2 | +8.6 |
| Aggregate STEM OOD | % Problems Solved (Best-of-100) | 63.8 | 72.9 | +9.1 |
| Aggregate STEM OOD | % Problems Solved (Best-of-100) | 61.3 | 72.9 | +11.6 |
| MATH (small-scale ablation) | Data Efficiency Multiplier | 1.0 | 2.6 | +1.6 |