Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

📝 Paper Summary

Hallucination survey Factuality evaluation

This survey provides a comprehensive taxonomy of hallucinations in Large Language Models (LLMs), distinguishing them from traditional NLG errors, and reviews current detection, explanation, and mitigation strategies.

Core Problem

Large Language Models frequently generate content that conflicts with user input, previous context, or established facts, undermining their reliability in real-world applications.

Why it matters:

LLMs are trained on massive, uncurated web data containing fabricated or biased information, making hallucinations hard to eliminate at the source
The versatility of LLMs across tasks and domains makes comprehensive evaluation and mitigation significantly harder than in task-specific models
Hallucinations in critical fields like medicine or law can lead to tangible real-life risks due to the imperceptibility of highly plausible but false errors

Concrete Example: When asked about the mother of Afonso II, an LLM might confidently answer 'Queen Urraca of Castile' instead of the correct 'Dulce Berenguer of Barcelona', presenting a fact-conflicting hallucination that misleads users.

Key Novelty

LLM-Specific Hallucination Taxonomy

Redefines hallucination for the LLM era by categorizing it into three distinct types: input-conflicting, context-conflicting, and fact-conflicting
Differentiates hallucination from other common LLM issues like ambiguity, incompleteness, bias, and under-informativeness

Architecture

Organization of the survey paper, mapping out the lifecycle of LLMs and where hallucinations can be introduced and addressed

Evaluation Highlights

The paper is a survey and does not propose a new model or report novel experimental results; it aggregates existing benchmarks.
Provides a taxonomy of evaluation benchmarks and analyzes existing approaches for mitigation.
Discusses unique challenges posed by massive training data, versatility, and error imperceptibility in LLMs.

Breakthrough Assessment

7/10

A timely and comprehensive survey that clarifies definitions and categorizations in a rapidly evolving field, though it aggregates existing knowledge rather than proposing a new technical method.

⚙️ Technical Details

Problem Definition

Setting: Survey and taxonomy of Hallucination in Large Language Models

Inputs: Literature on LLM hallucination

Outputs: Taxonomy, analysis of methods, and future directions

Pipeline Flow

Definition & Taxonomy (Categorizing hallucination types)
Benchmarks & Evaluation (Reviewing datasets and metrics)
Sources (Analyzing causes: data, training, inference)
Mitigation (Reviewing detection and reduction methods)

System Modules

Taxonomy Definition

Define three categories of hallucination (Input-, Context-, Fact-conflicting)

Model or implementation: Conceptual Framework

Novel Architectural Elements

Three-part taxonomy specifically tailored for general-purpose LLMs, expanding beyond traditional task-specific NLG definitions

Modeling

Base Model: Survey of various LLMs (e.g., GPT series, LLaMA)

Training Method: Literature Review / Survey

Compute: Not reported in the paper

Limitations

The survey is a snapshot in time; the field is moving fast, so lists of papers may become outdated quickly
Focuses primarily on fact-conflicting hallucination due to its predominance in current research, potentially under-representing other types
Does not propose a unified automated metric that solves the evaluation challenge

Reproducibility

Code: https://github.com/HillZhang1999/llm-hallucination-survey

The paper is a survey; it provides an open-source repository (https://github.com/HillZhang1999/llm-hallucination-survey) containing the list of papers and resources discussed.

📊 Experiments & Results

Evaluation Setup

Review of existing benchmarks and evaluation protocols

Metrics:

Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Hallucinations in LLMs are distinct from traditional NLG errors due to massive uncurated training data and the versatility of the models.
Fact-conflicting hallucination is the most challenging type to detect because it requires external world knowledge for verification.
Current evaluation relies heavily on expensive human annotation or model-based evaluation, highlighting a need for more reliable automated benchmarks.
Mitigation strategies are categorized into detection, explanation, and reduction, but a 'silver bullet' solution remains elusive.

📚 Prerequisite Knowledge

Prerequisites

Basic understanding of Large Language Models (LLMs)
Familiarity with Natural Language Generation (NLG)
Concepts of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF)

Key Terms

SFT: Supervised Fine-Tuning—training a pre-trained model on labeled data to improve performance on specific tasks

RLHF: Reinforcement Learning from Human Feedback—a method to align LLMs with human intent by using rewards derived from human preferences

NLG: Natural Language Generation—the subfield of AI focused on generating natural language text

hallucination: The generation of content by an LLM that deviates from user input, contradicts previously generated context, or misaligns with established world knowledge

input-conflicting hallucination: When LLM generated content deviates from the source input provided by users (e.g., misinterpreting instructions or summarizing incorrectly)

context-conflicting hallucination: When LLM generated content conflicts with information it previously generated within the same conversation

fact-conflicting hallucination: When LLM generated content contradicts established world knowledge or cannot be verified by it