| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| SHINE achieves state-of-the-art hallucination detection performance across multiple datasets and models compared to unsupervised baselines. | ||||
| TriviaQA | AUROC | 0.81 | 0.88 | +0.07 |
| SQuAD | AUROC | 0.78 | 0.82 | +0.04 |
| TruthfulQA | AUROC | 0.68 | 0.83 | +0.15 |
| TriviaQA | AUROC | 0.73 | 0.88 | +0.15 |