| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| TemplateRL consistently outperforms baselines on competition-level math benchmarks, with larger gains on harder tasks. | ||||
| AIME 2024 | Accuracy | 16.7 | 33.3 | 16.6 |
| AMC | Accuracy | 45.0 | 63.4 | 18.4 |
| MATH500 | Accuracy | 66.4 | 72.6 | 6.2 |
| Average (5 benchmarks) | Accuracy | 43.8 | 55.8 | 12.0 |