← Back to Paper List

Revisiting Hallucination Detection with Effective Rank-based Uncertainty

Rui Wang, Zeming Wei, Guanzhang Yue, Meng Sun
Peking University
arXiv (2025)
Factuality QA Benchmark

📝 Paper Summary

Hallucination Suppression Uncertainty Quantification (UQ)
The paper proposes a training-free hallucination detection method that quantifies uncertainty by calculating the effective rank of internal embedding matrices constructed from multiple model responses and layers.
Core Problem
Large Language Models often generate hallucinations that are linguistically fluent but factually incorrect, and existing detection methods are either computationally expensive (ensembles) or rely on external tools (retrieval).
Why it matters:
  • Hallucinations in high-stakes domains like healthcare and science can be dangerous because they are often indistinguishable from trustworthy responses
  • Current uncertainty quantification methods like Monte Carlo dropout are computationally impractical for billion-parameter models
  • Token-level probability metrics capture lexical confidence (word choice) rather than semantic uncertainty (meaning), leading to failures where models are confidently wrong
Concrete Example: A model might generate a biography for a non-existent person with high token-level confidence because the sentence structure is predictable, even though the semantic content varies wildly across different generation attempts.
Key Novelty
Effective Rank-based Uncertainty (ER)
  • Constructs a matrix using hidden state embeddings from multiple generated responses and specific layers
  • Uses 'effective rank' (derived from the entropy of singular values) to measure how semantically diverse the responses are
  • A low effective rank implies the model's internal states are consistent and confident (energy concentrated in few directions), while a high rank implies confusion and likely hallucination (energy spread diffusely)
Evaluation Highlights
  • Achieves highest AUROC in 8 out of 12 evaluation scenarios across Llama-2-7b, Llama-2-13b, and Mistral-7B models
  • Outperforms strong baselines like Semantic Entropy on the TriviaQA dataset with Llama-2-13b-chat (85.29 AUROC vs 84.15)
  • Maintains robustness across different temperatures, outperforming baselines significantly at standard settings (T=0.5, 1.0)
Breakthrough Assessment
7/10
Offers a mathematically elegant, training-free, and internal-state-based method that competes with or beats heavier semantic methods. However, it struggles slightly on reasoning-heavy tasks like SQuAD compared to semantic entropy.
×