| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparison of AdaCtrl-7B against the primary RL baseline (R1-SFT-RL) shows simultaneous improvements in accuracy and reductions in length across all datasets. | ||||
| AIME2025 | Accuracy | 46.67 | 48.34 | +1.67 |
| AIME2025 | Length | 9089 | 7986 | -1103 |
| MATH500 | Length | 6924 | 2628 | -4296 |
| GSM8K | Length | 2914 | 261 | -2653 |
| Results on the larger 14B model show even stronger gains in accuracy on hard tasks. | ||||
| AIME2024 | Accuracy | 50.42 | 60.83 | +10.41 |
| AIME2024 | Length | 13149 | 10756 | -2393 |