← Back to Paper List

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Cheng Zhou, Xinbing Wang, Luoyi Fu
Shanghai Jiaotong University, Amazon AWS AI, Westlake University, IGSNRR, Chinese Academy of Sciences
Conference on Empirical Methods in Natural Language Processing (2023)
Factuality Benchmark

📝 Paper Summary

Hallucination suppression Hallucination detection
A reference-free hallucination detection method that calculates uncertainty using a proxy model while mimicking human focus on keywords, historical context propagation, and entity types.
Core Problem
Existing hallucination detection methods rely on costly external retrieval or inefficient sampling of multiple responses, while naive uncertainty metrics (like average entropy) fail due to model overconfidence or underconfidence.
Why it matters:
  • Retrieval-based methods require external knowledge bases that may not be accessible or up-to-date
  • Sampling-based methods (e.g., SelfCheckGPT) are computationally expensive and inefficient for real-time applications
  • Standard probability metrics from proxy models are noisy because they include uninformative tokens and suffer from exposure bias
Concrete Example: In a biography, a model might confidently generate '2012 Summer Olympics' (high probability) because it attended strongly to a previous hallucinated mention of '2012', creating a cascade of errors that naive uncertainty metrics miss.
Key Novelty
Focus-driven Uncertainty Quantification
  • Focus on informative keywords: Calculates hallucination scores only on named entities and nouns rather than all tokens to reduce noise
  • Focus on preceding words: Propagates uncertainty from previous unreliable tokens to current ones via attention weights to penalize 'overconfident' cascades
  • Focus on token properties: Adjusts probabilities using entity type constraints and token frequency (IDF) to mitigate 'underconfidence' where valid rare tokens get low scores
Architecture
Architecture Figure Figure 1
Conceptual comparison between naive proxy model uncertainty and the proposed 'Focus' method.
Evaluation Highlights
  • Achieves 89.79 AUC-PR on WikiBio GPT-3 sentence-level detection with LLaMA-30b, surpassing SelfCheckGPT-Combination (87.33)
  • Improves Pearson correlation with human judgment to 77.15 (vs. 69.05 for SelfCheckGPT) on passage-level detection
  • LLaMA-7b with the proposed 'Focus' method outperforms GPT-3's own uncertainty metrics (84.26 vs 83.21 AUC-PR), showing effectiveness even with smaller proxy models
Breakthrough Assessment
7/10
Strong methodological contribution by refining uncertainty estimation without external resources. Outperforms SOTA baselines (SelfCheckGPT) efficiently, though relies on proxy model quality.
×