← Back to Paper List

Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

Alexandra Bazarova, Aleksandr Yugay, Andrey Shulga, Alina Ermilova, Andrei Volodichev, Konstantin Polev, Julia Belikova, Rauf Parchiev, Dmitry Simakov, Maxim Savchenko, Andrey Savchenko, Serguei Barannikov, Alexey Zaytsev
Applied AI Institute, SB AI Lab, CNRS, Universite Paris Cite
arXiv (2025)
Factuality RAG QA Benchmark

📝 Paper Summary

Hallucination suppression
TOHA detects hallucinations in RAG systems by analyzing the topological divergence between prompt and response subgraphs within specific 'hallucination-aware' attention heads.
Core Problem
Existing hallucination detection methods are either computationally expensive (requiring multiple generations) or require large annotated datasets for supervised training, which are often scarce.
Why it matters:
  • Hallucinations undermine user trust in sensitive applications, necessitating reliable detection mechanisms for safe deployment
  • Computational overhead of sampling-based methods (like SelfCheckGPT) limits real-time applicability
  • Supervised methods struggle with domain transfer due to the scarcity of high-quality annotated hallucination datasets
Concrete Example: In a RAG scenario, if a model hallucinates an answer not present in the retrieved context, standard probability metrics might still be high. TOHA detects this because the topological structure of the attention graph shows a high divergence (novelty) between the prompt and the generated response in specific attention heads, signaling that the response is not grounded in the prompt.
Key Novelty
TOpology-based HAllucination detector (TOHA)
  • Adapts Manifold Topology Divergence to graph structures (MTop-Div) to measure the topological dissimilarity between prompt and response tokens in attention maps
  • Identifies a small set of 'hallucination-aware' attention heads that consistently show higher divergence for hallucinations, regardless of the dataset
  • Interpret the divergence score as a measure of informational novelty: high divergence implies the response introduces information not topologically grounded in the prompt
Architecture
Architecture Figure Figure 1
Conceptual illustration of Attention Graph construction and MTop-Div calculation.
Evaluation Highlights
  • +11.7% improvement on MS MARCO (long-form QA) for Mistral-7B compared to state-of-the-art baselines
  • +21.6% improvement on CoQA (conversational QA) for LLaMA-2-7B compared to baselines
  • Operates ~7x faster than SelfCheckGPT (with 1 additional sample) and >70x faster than standard sampling-based configurations
Breakthrough Assessment
8/10
Offers a highly efficient, training-free method that matches or beats computationally expensive baselines. The application of TDA to attention graphs for this purpose is novel and theoretically grounded.
×