| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main comparison showing CSR's faithfulness improvements (COS) over Process Reward Models (PRM). | ||||
| GSM8K | COS | Not explicitly reported in the paper | Not explicitly reported in the paper | +32.8 |
| HotpotQA | COS | Not explicitly reported in the paper | Not explicitly reported in the paper | +34.8 |
| PubMedQA | COS | 28.7 | 67.3 | +38.6 |
| Average | Training Overhead | 92.5% | 9% | -83.5% |
| Held-out perturbations | COS | 8-18% | 64-77% | Not reported in the paper |
| GSM8K | Human Rating (1-5) | 2.3 | 4.1 | +1.8 |