| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| AGSER consistently outperforms baselines across different LLMs on hallucination detection AUC. | ||||
| Average across Books/Movies/GCI | AUC | 0.850 | 0.886 | +0.036 |
| Average across Books/Movies/GCI | AUC | 0.867 | 0.895 | +0.028 |
| Average across Books/Movies/GCI | AUC | 0.880 | 0.889 | +0.009 |
| Average across Books/Movies/GCI | AUC | 0.824 | 0.891 | +0.067 |
| Ablation studies confirm the necessity of both attentive and non-attentive query components. | ||||
| Average across Books/Movies/GCI | AUC | 0.575 | 0.886 | +0.311 |
| Average across Books/Movies/GCI | AUC | 0.877 | 0.886 | +0.009 |