| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| TruthfulQA | True (%) | 80.66 | 89.35 | +8.69 |
| TruthfulQA | True (%) | 80.34 | 87.15 | +6.81 |
| FactualityPrompt | Non-factual Error | 15.66 | 9.45 | -6.21 |
| BOLD | Toxicity | 0.129 | 0.000 | -0.129 |
| HONEST | Hurtfulness (Queer) | 0.038 | 0.004 | -0.034 |
| TruthfulQA | True (%) | 82.37 | 89.35 | +6.98 |