| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| TTRL consistently improves performance across various base models on the AIME 2024 benchmark. | ||||
| AIME 2024 | Pass@1 | 12.9 | 40.2 | +27.3 |
| AIME 2024 | Pass@1 | 51.7 | 69.2 | +17.5 |
| AIME 2024 | Pass@1 | 4.6 | 10.0 | +5.4 |
| Performance on MATH-500 shows massive gains for smaller models. | ||||
| MATH-500 | Pass@1 | 32.7 | 73.0 | +40.3 |
| MATH-500 | Accuracy | 84.2 | 85.2 | +1.0 |