| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results demonstrate that MarsRL significantly boosts the performance of the 30B model, surpassing the base model and even a much larger 235B model. | ||||
| AIME 2025 | Accuracy | 86.5 | 93.3 | +6.8 |
| BeyondAIME | Accuracy | 64.9 | 73.8 | +8.9 |
| AIME 2025 | Accuracy | 92.3 | 93.3 | +1.0 |
| Ablation study on sampling strategies shows that Adaptive sampling (prioritizing hard/correct pairs) works best. | ||||
| AIME 2025 | Accuracy | 91.8 | 93.3 | +1.5 |