| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| PACER consistently outperforms the strong DeepConf-Online baseline across multiple hard math benchmarks using GPT-OSS. | ||||
| HMMT 2025 | Accuracy (Score/30) | 25 | 35 | +10 |
| AIME / BRUMO | Accuracy | Not reported in the paper | Not reported in the paper | Not reported in the paper |