← Back to Paper List

Retrieval Augmentation Reduces Hallucination in Conversation

(Meta FAIR) Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston
Facebook AI Research
EMNLP (2021)
RAG Factuality QA

📝 Paper Summary

Modularized RAG pipeline Factuality in Dialogue
The authors adapt neural retrieval-augmented generation (RAG) architectures for open-domain dialogue, demonstrating that conditioning generation on retrieved knowledge significantly reduces hallucination compared to standard large language models.
Core Problem
State-of-the-art dialogue models, despite their fluency, frequently 'hallucinate' knowledge—generating plausible but factually incorrect statements—because they rely solely on implicit weights rather than external grounding.
Why it matters:
  • Large language models (like GPT-3) mix up facts between similar entities or make subtle token errors that render statements false
  • Existing RAG methods work for QA but struggle with complex multi-turn dialogue contexts which require maintaining conversational flow alongside factuality
Concrete Example: When asked about 'Kyunghyun Cho', GPT-3 hallucinates that he is the 'most intelligent person on Earth', an 'ex-Go champion', and won awards he never won (e.g., NIPS 2013 Best Paper), whereas a retrieval-augmented model would ground the response in retrieved Wikipedia facts.
Key Novelty
Neural-Retriever-in-the-Loop for Dialogue
  • Adapts RAG and Fusion-in-Decoder (FiD) architectures specifically for multi-turn dialogue rather than just QA
  • Introduces 'RAG-Turn' to retrieve documents per dialogue turn, balancing local relevance with global context
  • Enhances retrieval via Poly-encoder re-ranking to allow finer-grained interaction between dialogue context and candidate documents
Evaluation Highlights
  • Reduces hallucinated responses by over 60% compared to standard large language models according to human evaluations on Wizard of Wikipedia
  • Achieves state-of-the-art F1 scores on Wizard of Wikipedia (Test Unseen) with the RAG DPR-Poly model (+3.1 F1 over non-augmented BART)
  • Demonstrates superior generalization: On out-of-distribution topics, Knowledge F1 gains are 85% over baselines, compared to 70% for in-distribution data
Breakthrough Assessment
8/10
Significant for establishing neural retrieval as a standard for reducing hallucination in dialogue. Successfully adapts QA-centric RAG/FiD to conversational settings with novel turn-based retrieval strategies.
×