| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| General Domain Results (trained on WebInstruct) show CER consistently outperforming baselines on MMLU-Pro and SuperGPQA. | ||||
| MMLU-Pro | pass@1 | 47.5 | 48.1 | +0.6 |
| SuperGPQA | pass@1 | 32.8 | 33.5 | +0.7 |
| Mathematical Domain Results (trained on MATH-7.5K) show CER is competitive with highly specific Rule-based verifiers and outperforms model-based verifiers. | ||||
| MATH500 | pass@1 | 59.2 | 58.6 | -0.6 |
| MATH500 | pass@1 | 59.2 | 60.1 | +0.9 |