← Back to Paper List

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

B Deng, W Wang, F Zhu, Q Wang, F Feng
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, University of Science and Technology of China, Harbin Institute of Technology (Shenzhen), Peng Cheng Laboratory
arXiv, 6/2024 (2024)
RAG QA

📝 Paper Summary

Modularized RAG pipeline Answer generation
CrAM mitigates RAG hallucinations caused by misinformation by identifying influential attention heads and scaling down their weights for low-credibility documents during inference without fine-tuning.
Core Problem
RAG systems often retrieve documents containing misinformation, which misleads LLMs into generating incorrect answers because standard models lack mechanisms to down-weight low-credibility sources.
Why it matters:
  • Maliciously generated misinformation in external corpora can significantly degrade LLM performance
  • Simply filtering out low-credibility documents risks losing relevant information, leading to inferior performance compared to soft adjustment
  • Existing solutions like Supervised Fine-Tuning (SFT) require expensive resources and curated data, limiting applicability
Concrete Example: When asking 'Who was the first person to win the Nobel Prize in Physics?', a retrieval system might fetch a misinformation document claiming it was Einstein (instead of Roentgen). A standard LLM attends to this false document and answers 'Einstein', whereas CrAM suppresses attention to the false document based on its low credibility score.
Key Novelty
Credibility-aware Attention Modification (CrAM)
  • Identify 'influential' attention heads that contribute most to generating incorrect answers when misinformation is present, using a modified causal tracing method
  • Modify the attention weights of these specific heads during inference by element-wise multiplication with normalized document credibility scores
  • Allows the LLM to 'pay less attention' to tokens from low-credibility documents without retraining the model or discarding the documents entirely
Architecture
Architecture Figure Figure 2
The CrAM workflow: 1) Identification of influential attention heads using causal tracing on a small set, 2) Modification of attention weights for those heads during inference based on document credibility scores.
Evaluation Highlights
  • +31.9% Exact Match (EM) improvement over Prompt Based baseline on TriviaQA using Llama2-13B in the presence of misinformation
  • +21.1% EM improvement over Naive RAG on Natural Questions using Llama2-13B when one misinformation document is present
  • Surpasses Supervised Fine-Tuning (SFT) methods like CAG in robustness against misinformation while remaining training-free
Breakthrough Assessment
7/10
Effective plug-and-play solution for a critical RAG problem (misinformation). Outperforming SFT methods without training is significant, though reliance on external credibility scores is a dependency.
×