| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| TruthfulQA (Multiple Choice) | MC1 | 28.0 | 28.9 | +0.9 |
| TruthfulQA (Multiple Choice) | MC2 | 44.7 | 45.5 | +0.8 |
| TruthfulQA (Generation) | Truth*Info | 39.95 | 48.33 | +8.38 |
| TruthfulQA (Generation) | % Reject | 23.26 | 8.45 | -14.81 |
| Natural Questions | Accuracy (Exact Match implied) | 23.7 | 26.1 | +2.4 |
| FACTOR (Expert) | Accuracy | 55.1 | 56.4 | +1.3 |