Tencent AI lab,
Soochow University,
Zhejiang University,
Renmin University of China,
Nanyang Technological University,
Toyota Technological Institute at Chicago
arXiv
(2023)
FactualityBenchmark
📝 Paper Summary
Hallucination surveyFactuality evaluation
This survey provides a comprehensive taxonomy of hallucinations in Large Language Models (LLMs), distinguishing them from traditional NLG errors, and reviews current detection, explanation, and mitigation strategies.
Core Problem
Large Language Models frequently generate content that conflicts with user input, previous context, or established facts, undermining their reliability in real-world applications.
Why it matters:
LLMs are trained on massive, uncurated web data containing fabricated or biased information, making hallucinations hard to eliminate at the source
The versatility of LLMs across tasks and domains makes comprehensive evaluation and mitigation significantly harder than in task-specific models
Hallucinations in critical fields like medicine or law can lead to tangible real-life risks due to the imperceptibility of highly plausible but false errors
Concrete Example:When asked about the mother of Afonso II, an LLM might confidently answer 'Queen Urraca of Castile' instead of the correct 'Dulce Berenguer of Barcelona', presenting a fact-conflicting hallucination that misleads users.
Key Novelty
LLM-Specific Hallucination Taxonomy
Redefines hallucination for the LLM era by categorizing it into three distinct types: input-conflicting, context-conflicting, and fact-conflicting
Differentiates hallucination from other common LLM issues like ambiguity, incompleteness, bias, and under-informativeness
Architecture
Organization of the survey paper, mapping out the lifecycle of LLMs and where hallucinations can be introduced and addressed
Evaluation Highlights
The paper is a survey and does not propose a new model or report novel experimental results; it aggregates existing benchmarks.
Provides a taxonomy of evaluation benchmarks and analyzes existing approaches for mitigation.
Discusses unique challenges posed by massive training data, versatility, and error imperceptibility in LLMs.
Breakthrough Assessment
7/10
A timely and comprehensive survey that clarifies definitions and categorizations in a rapidly evolving field, though it aggregates existing knowledge rather than proposing a new technical method.
⚙️ Technical Details
Problem Definition
Setting: Survey and taxonomy of Hallucination in Large Language Models
Inputs: Literature on LLM hallucination
Outputs: Taxonomy, analysis of methods, and future directions
The paper is a survey; it provides an open-source repository (https://github.com/HillZhang1999/llm-hallucination-survey) containing the list of papers and resources discussed.
📊 Experiments & Results
Evaluation Setup
Review of existing benchmarks and evaluation protocols
Metrics:
Statistical methodology: Not explicitly reported in the paper
Main Takeaways
Hallucinations in LLMs are distinct from traditional NLG errors due to massive uncurated training data and the versatility of the models.
Fact-conflicting hallucination is the most challenging type to detect because it requires external world knowledge for verification.
Current evaluation relies heavily on expensive human annotation or model-based evaluation, highlighting a need for more reliable automated benchmarks.
Mitigation strategies are categorized into detection, explanation, and reduction, but a 'silver bullet' solution remains elusive.
📚 Prerequisite Knowledge
Prerequisites
Basic understanding of Large Language Models (LLMs)
Familiarity with Natural Language Generation (NLG)
Concepts of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF)
Key Terms
SFT: Supervised Fine-Tuning—training a pre-trained model on labeled data to improve performance on specific tasks
RLHF: Reinforcement Learning from Human Feedback—a method to align LLMs with human intent by using rewards derived from human preferences
NLG: Natural Language Generation—the subfield of AI focused on generating natural language text
hallucination: The generation of content by an LLM that deviates from user input, contradicts previously generated context, or misaligns with established world knowledge
input-conflicting hallucination: When LLM generated content deviates from the source input provided by users (e.g., misinterpreting instructions or summarizing incorrectly)
context-conflicting hallucination: When LLM generated content conflicts with information it previously generated within the same conversation
fact-conflicting hallucination: When LLM generated content contradicts established world knowledge or cannot be verified by it