← Back to Paper List

Fromragsto rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

H Wadhwa, R Seetharaman, S Aggarwal…
University of Massachusetts, Amherst, Microsoft, University of Maryland, College Park
arXiv, 6/2024 (2024)
RAG Factuality Memory

📝 Paper Summary

Mechanistic Interpretability Retrieval-Augmented Generation (RAG)
Mechanistic probing reveals that language models exhibit a 'shortcut' behavior in RAG settings, bypassing internal parametric knowledge in favor of attending directly to context tokens.
Core Problem
While RAG is widely used to mitigate hallucinations, it is unclear mechanistically how models balance their internal parametric knowledge against external retrieved context when answering factual queries.
Why it matters:
  • Understanding the interplay between internal priors and external context is crucial for preventing model drift and ensuring robust reasoning
  • Existing knowledge editing techniques focus on updating parameters, but lack insight into how RAG context overrides these parameters dynamically during inference
  • Blindly trusting RAG without understanding the mechanism can lead to inconsistent predictions even with perfect retrieval
Concrete Example: For the query 'The Space Needle is located in the city of', a vanilla model relies on internal MLPs to retrieve 'Seattle'. When RAG context is added, the model might ignore its internal knowledge entirely and copy the answer from the context, but the internal mechanism of this switch was previously unproven.
Key Novelty
Mechanistic Evidence of RAG Shortcuts
  • Demonstrates via Causal Tracing that the average indirect effect of subject tokens (which usually trigger fact retrieval) drops significantly when RAG context is present
  • Uses Attention Knockouts to prove the model's last token stops attending to the query subject and instead attends strongly to the answer token in the context
  • Quantifies the 'shortcut' mechanism: models effectively turn off their internal factual retrieval circuits in favor of a copy-mechanism from the context
Evaluation Highlights
  • In Llama-2 (7B), the Average Indirect Effect (AIE) of subject tokens on the prediction drops ~5x (from ~0.20 to ~0.0375) when RAG context is introduced
  • For Phi-2, Attention Contribution from the query Subject Token to the Last Token drops ~7x in the RAG setting (10.7 vs 72.6 in vanilla)
  • Knocking out attention from the subject token reduces prediction probability by <5% in RAG settings, compared to ~20-25% in vanilla settings, proving reliance on context over query subject
Breakthrough Assessment
7/10
Provides the first mechanistic proof of 'shortcut' behavior in RAG, validating common intuitions with hard evidence from causal tracing and attention analysis.
×