← Back to Paper List

HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs

Qing Li, Jiahui Geng, Zongxiong Chen, Derui Zhu, Yuxia Wang, Congbo Ma, Chenyang Lyu, Fakhri Karray
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Fraunhofer Institute for Open Communication Systems (FOKUS), Technical University of Munich, New York University Abu Dhabi, Alibaba International Digital Commerce
arXiv (2025)
Factuality Benchmark

📝 Paper Summary

Hallucination suppression White-box hallucination detection
HD-NDEs detects hallucinations by modeling the sequence of LLM internal states as continuous dynamic trajectories using Neural Differential Equations, rather than relying solely on the final token's representation.
Core Problem
Existing classification-based methods often rely on the hidden state of the final token to detect hallucinations, failing when non-factual information appears early or mid-sequence.
Why it matters:
  • Hallucinations in Large Language Models (LLMs) limit real-world deployment by producing inaccurate or non-factual statements
  • Current methods struggle to capture the reliability of the entire sequence if the error occurs before the final token, reducing detection accuracy
  • Verification via external retrieval is computationally expensive and slow for high-throughput applications
Concrete Example: When an LLM answers a question incorrectly (e.g., 'The first virus was discovered by...'), the hidden state of the *last* token might look nearly identical to that of a correct answer (as shown in PCA analysis), even if the middle tokens diverged significantly.
Key Novelty
Hallucination Detection via Neural Differential Equations (HD-NDEs)
  • Treats the sequence of token hidden states as a continuous-time dynamic system rather than discrete, independent points
  • Uses Neural ODEs, CDEs, and SDEs to model the 'trajectory' of thought within the LLM's latent space, capturing how information evolves over the entire generation process
  • Maps this full dynamic trajectory to a classification space to determine truthfulness, capturing early-sequence errors that final-token classifiers miss
Evaluation Highlights
  • Achieves over 14% improvement in AUC-ROC on the True-False dataset compared to state-of-the-art techniques
  • Consistently outperforms baseline methods (like SAPLMA and ITI) across five datasets (TruthfulQA, SQuAD, etc.) and six LLMs (including LLaMA-2-7B and Vicuna-7B)
  • Neural CDEs generally yield the highest detection performance among the three differential equation variants (ODE, CDE, SDE) tested
Breakthrough Assessment
7/10
Novel application of Neural Differential Equations to the specific problem of hallucination detection. The theoretical motivation (modeling dynamics) addresses a clear weakness in prior snapshot-based methods, and empirical gains are significant.
×