| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Lynx-70B outperforms major closed and open-source models on the aggregated HaluBench dataset. | ||||
| HaluBench (Average) | Accuracy | 0.860 | 0.865 | +0.005 |
| HaluBench (Average) | Accuracy | 0.838 | 0.865 | +0.027 |
| HaluBench (Average) | Accuracy | 0.819 | 0.865 | +0.046 |
| Lynx outperforms heuristic-based RAG metrics significantly. | ||||
| HaluBench (Average) | Accuracy | 0.784 | 0.865 | +0.081 |