| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on hard math benchmarks shows RLTR significantly improves consistency (Maj@K) over RLVR. | ||||
| AMC23 | Maj@64 | 61.7 | 67.5 | +5.8 |
| AIME 2024 | Maj@64 | 16.7 | 21.1 | +4.4 |
| AIME 2024 | Average Accuracy | 9.8 | 14.8 | +5.0 |
| Results on moderate benchmarks show RLTR fixes RLVR's consistency degradation at high K. | ||||
| MATH-500 | Maj@64 | 82.6 | 84.2 | +1.6 |
| GSM8K | Average Accuracy | 89.1 | 92.0 | +2.9 |
| Computational efficiency analysis. | ||||
| MATH-500 | EFLOPs (ExaFLOPs) to reach convergence | 39.76 | 92.75 | +52.99 |