| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparison against Dr. GRPO baseline across different base models, showing consistent improvements. | ||||
| Average (AMC, MATH500, Minerva, OlympiadBench) | Pass@1 Accuracy | 53.5 | 56.0 | +2.5 |
| Average (AMC, MATH500, Minerva, OlympiadBench) | Pass@1 Accuracy | 47.3 | 49.7 | +2.4 |