| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparison of different training paradigms on the 7B model shows RLVR on text-only data yields the highest performance, while SFT degrades it. | ||||
| Average (6 benchmarks) | Accuracy | 53.5 | 54.9 | +1.4 |
| Average (6 benchmarks) | Accuracy | 53.5 | 43.8 | -9.7 |