← Back to Paper List

AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs

Piotr Matys, Jan Eliasz, Konrad Kiełczyński, Mikołaj Langner, Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko
Wrocław University of Science and Technology
arXiv (2025)
Factuality RAG QA Benchmark

📝 Paper Summary

Hallucination suppression Modularized RAG pipeline
AggTruth detects hallucinations in RAG systems by aggregating internal attention scores over retrieved passages into lightweight features, enabling real-time classification without requiring multiple generations.
Core Problem
RAG systems still hallucinate when context is noisy or misused, and existing detection methods either require expensive multiple generations or lack robustness across different tasks.
Why it matters:
  • Hallucinations prevent the deployment of LLMs in high-stakes real-world applications where reliability is critical
  • Current state-of-the-art methods like Lookback-Lens rely on attention to the entire prompt (including system instructions), making them brittle to input changes
  • Generating multiple answers for consistency checks (e.g., self-consistency) is too slow and computationally expensive for real-time applications
Concrete Example: When an LLM answers a question based on a retrieved passage, it might generate a plausible but false entity. AggTruth detects this by observing that the model's internal attention heads fail to focus consistently on the relevant passage tokens during the generation of the false entity, unlike when generating factual content.
Key Novelty
AggTruth (Attention Aggregation for Truthfulness)
  • Instead of analyzing the full attention matrix, focused specifically on attention scores directed at the retrieved passage (context) during token generation
  • Proposed four distinct mathematical techniques to aggregate these sparse attention scores into dense feature vectors (Sum, Cosine Similarity, Entropy, Jensen-Shannon Divergence)
  • Introduced a 'Passage Percentage' feature to correct for the natural dilution of attention as generated sequences get longer
Architecture
Architecture Figure Figure 1
The conceptual framework of AggTruth. It illustrates how attention scores from generated tokens towards context tokens are extracted and aggregated.
Evaluation Highlights
  • Outperforms SOTA (Lookback-Lens) on summarization tasks (CNN/DM, XSum) using Llama-3-8B-Instruct
  • Achieves competitive performance on QA tasks (Natural Questions, HotPotQA) while using significantly fewer features than hidden-state methods
  • Demonstrates robust cross-task generalization, maintaining high detection performance when trained on summarization and tested on QA (and vice versa)
Breakthrough Assessment
7/10
Solid methodological improvement for online hallucination detection. It simplifies feature extraction while improving robustness compared to Lookback-Lens, though it relies on standard classifiers.
×