| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Offline experiments comparing UNA against DPO and KTO on standard benchmarks using different feedback types. | ||||
| Open LLM Leaderboard (New) | Average Score | 28.53 | 30.92 | +2.39 |
| MT-Bench | Score | 5.99 | 6.78 | +0.79 |
| AlpacaEval | Win Rate | 3.67 | 8.78 | +5.11 |
| Online experiments comparing UNA against RLHF (PPO) using a reward model. | ||||
| Open LLM Leaderboard (New) | Average Score | 29.12 | 29.15 | +0.03 |
| MT-Bench | Score | 6.60 | 6.71 | +0.11 |
| AlpacaEval | Win Rate | 10.15 | 10.54 | +0.39 |