| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Average of 5 benchmarks (AIME24, Math500, OlympiadMath, MinervaMath, AMC) | Average Score | Not reported in the paper | Not reported in the paper | +33.00 |
| Deepseek-R1-Distill-Qwen-7B Evaluation | Score improvement | Not reported in the paper | Not reported in the paper | +4.50 |
| Math reasoning datasets (Training plots) | Performance Improvement during training | Not reported in the paper | Not reported in the paper | +30% |