โ† Back to Paper List

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Xinyue Zeng, Junhong Lin, Yujun Yan, Feng Guo, Liang Shi, Jun Wu, Dawei Zhou
CS Department, EECS Department, Dartmouth College, Statistics Department, Michigan State University
arXiv (2026)
Factuality Reasoning Benchmark

๐Ÿ“ Paper Summary

Hallucination detection Model interpretability / Theoretical analysis
The paper introduces HalluGuard, an NTK-based metric that unifies data-driven and reasoning-driven hallucination detection by analyzing training-time semantic gaps and inference-time instability without external references.
Core Problem
Existing detection methods typically address only one source of hallucination (data flaws OR reasoning failures) and rely on task-specific heuristics or external retrieval, limiting generalization.
Why it matters:
  • Hallucinations in high-stakes domains like healthcare and law can lead to severe consequences (e.g., incorrect diagnoses delaying treatment)
  • Hallucinations often evolve during generation, shifting from data errors to reasoning failures, which single-source detectors fail to capture
  • Reliance on external references or heavy sampling makes deployment inefficient and brittle in complex scenarios
Concrete Example: A medical model might misclassify a disease due to bias (data-driven), which then triggers a logical breakdown in the treatment plan (reasoning-driven). Current tools might catch the initial bias or the logic error, but not the evolving compound risk.
Key Novelty
Hallucination Risk Bound & HalluGuard
  • Theoretically decomposes hallucination risk into two terms: a data-driven term (semantic approximation gap) and a reasoning-driven term (inference instability)
  • Uses Neural Tangent Kernel (NTK) geometry to proxy these terms: the determinant of the NTK Gram matrix captures representational quality, while Jacobian spectral norms capture reasoning stability
Evaluation Highlights
  • HalluGuard achieves state-of-the-art detection performance across 10 diverse benchmarks and 9 LLM backbones.
  • Consistently outperforms 11 competitive baselines including SelfCheckGPT, semantic entropy, and various uncertainty measures.
  • Strong correlation found between NTK determinant and data-centric tasks (0.84 on SQuAD), and between spectral proxy and reasoning tasks (0.88 on MATH-500).
Breakthrough Assessment
8/10
Strong theoretical contribution uniting two disparate hallucination types under one framework, backed by a practical, reference-free metric that achieves SOTA results.
×