| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| The paper primarily validates the benchmark itself by showing correlations. Quantitative results for specific reward models are available in the appendix, but the core claim is the *validation* of the metric. | ||||
| RewardBench vs Downstream RLHF | Correlation | 0 | Negative | Negative |