| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Mix-GRM outperforms baselines on general reward benchmarks, with RLVR providing significant amplification. | ||||
| Average of 5 Benchmarks | Average Score | 76.9 | 79.4 | +2.5 |
| Average of 5 Benchmarks | Average Score | 70.1 | 75.1 | +5.0 |
| Average of 5 Benchmarks | Average Score | 65.2 | 75.1 | +9.9 |
| Downstream utility experiments show Mix-GRM excels as a verifier and supervisor. | ||||
| MATH | Best-of-N Accuracy (N=10) | 37.7 | 43.2 | +5.5 |
| Instruction Following | Win Rate | 12.0 | 12.1 | +0.1 |
| GSM8K | Accuracy | 75.1 | 77.6 | +2.5 |