| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on Qwen2.5-1.5B-Math show OXA significantly outperforms conventional SFT across averaged benchmarks. | ||||
| Average (6 benchmarks) | Pass@1 | Not reported in the paper | Not reported in the paper | +6.6 |
| Average (6 benchmarks) | Pass@k | Not reported in the paper | Not reported in the paper | +5.5 |