A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

📝 Paper Summary

Hallucination Taxonomy Hallucination Causes Hallucination Detection Hallucination Mitigation

This survey redefines LLM hallucinations into factuality and faithfulness categories, traces their causes through data, training, and inference stages, and systematically reviews detection and mitigation strategies.

Core Problem

LLMs prone to generating plausible yet non-factual content (hallucinations) pose significant risks to reliability in real-world applications like search engines and medical advice.

Why it matters:

Misleading information from widely used systems (chatbots, search) can spread false beliefs or cause harm in decision-making
Traditional NLG hallucination definitions (intrinsic/extrinsic) are insufficient for open-ended LLMs, which require a broader scope covering factual errors and user-instruction misalignment
The convincing, human-like nature of LLM responses makes detecting these errors particularly challenging for users

Concrete Example: When asked about the Eiffel Tower's environmental impact, an LLM might fabricate a claim that it 'led to the extinction of the Parisian tiger'—a species that never existed. This is a Factual Fabrication (Unverifiability) hallucination.

Key Novelty

Comprehensive LLM-Specific Hallucination Taxonomy and Cause Analysis

Proposes a new taxonomy splitting hallucinations into 'Factuality' (conflict with real-world facts) and 'Faithfulness' (conflict with user instructions, context, or self-consistency)
Analyzes causes across three stages: Data (misinformation, biases), Training (pre-training flaws, misalignment), and Inference (decoding strategies, softmax bottlenecks)
Connects specific mitigation strategies (like RAG or feedback-based editing) directly to the identified causes in a structured framework

Evaluation Highlights

Surveys benchmarks like TruthfulQA, HalluQA, and HaluEval-2.0 to quantify hallucination rates
Highlights that detection methods include both factuality checks (using external knowledge) and faithfulness checks (checking consistency with context/instructions)
Discusses mitigation success in specific areas, such as RAG (Retrieval-Augmented Generation) reducing factual errors, though noting RAG itself can suffer from hallucinations

Breakthrough Assessment

9/10

A foundational survey that establishes the standard taxonomy for LLM hallucinations. It structures a chaotic field into clear categories (Factuality vs. Faithfulness) and causes, guiding future research effectively.

⚙️ Technical Details

Problem Definition

Setting: General-purpose text generation where the output y must be consistent with real-world facts and faithful to the input x (instructions/context)

Inputs: User prompt x containing instructions and optionally context

Outputs: Generated response y

Pipeline Flow

Taxonomy Definition (Factuality vs. Faithfulness)
Cause Analysis (Data, Training, Inference)
Detection & Benchmarking
Mitigation Strategies

System Modules

Taxonomy Framework

Classifies hallucinations into Factuality (Contradiction, Fabrication) and Faithfulness (Instruction, Context, Logical Inconsistency)

Model or implementation: Conceptual Framework

Cause Analyzer

Identifies root causes

Model or implementation: Analysis

Detector

Identifies presence of hallucinations

Model or implementation: Various (e.g., SelfCheckGPT, FactTool)

Novel Architectural Elements

Unified Taxonomy: Explicit separation of Faithfulness (instruction/context adherence) from Factuality (world knowledge)
Cause-Effect Mapping: Structuring mitigation strategies directly against the three stages of causes (Data, Training, Inference)

Modeling

Base Model: Covers multiple LLMs (GPT-4, LLaMA, Claude, Gemini, PaLM)

Training Method: Survey covers SFT and RLHF as causes/solutions

Objective Functions:

Purpose: Pre-training objective.

Formally: Next-token prediction (maximizing likelihood of text corpus)
Purpose: RLHF alignment.

Formally: Maximizing reward from preference model using PPO (Proximal Policy Optimization)

Adaptation: Discusses various adaptation methods like RAG and Model Editing

Trainable Parameters: Not reported in the paper

Training Data:

Discusses flaws in pre-training data (misinformation)
Discusses flaws in SFT data (inferior alignment data)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Ji et al. (2023): Extends scope from standard NLG to LLMs, addressing open-ended generation
vs. Tonmoy et al. (2024): Covers taxonomy, causes, and detection in addition to mitigation
vs. Wang et al. (2023): Includes Faithfulness hallucinations (instruction following, consistency) alongside Factuality
+ 1 more
vs. Zhang et al. (2023): Proposes a unique taxonomy and specifically maps mitigation strategies to the Data/Training/Inference causes

Limitations

Survey nature means no new empirical results are generated; relies on existing literature
Rapidly evolving field means some specific model references (e.g., LLaMA-1) may become dated quickly
Focuses primarily on text, with only brief mention of vision-language models in future directions

Reproducibility

Code: https://github.com/adwardlee/huggingface-llm-survey

The paper is a survey and does not propose a single model to reproduce. It provides a GitHub repository (https://github.com/adwardlee/huggingface-llm-survey) containing the list of reviewed papers and resources.

📊 Experiments & Results

Evaluation Setup

Review of existing benchmarks and evaluation protocols

Benchmarks:

TruthfulQA (QA measuring truthfulness vs. imitative falsehoods)
HalluQA (Chinese hallucination benchmark)
HaluEval (Large-scale hallucination evaluation benchmark)
FELM (Factuality evaluation)
SelfCheckGPT-Wikibio (Hallucination detection benchmark)

Metrics:

Factuality Score
Faithfulness Score
Detection Accuracy
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Hallucinations are not just factual errors but also failures in following instructions or internal logic (faithfulness).
Causes are systemic: arising from flawed training data (misinformation), the inherent nature of next-token prediction (Softmax bottleneck), and imperfect decoding strategies.
Mitigation requires a multi-pronged approach: cleaning data, aligning models via RLHF, and using inference-time techniques like RAG or decoding constraints.
RAG is a powerful mitigation tool but introduces its own challenges, such as retrieving irrelevant contexts that mislead the generator.

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM training stages (Pre-training, SFT, RLHF)
Basic knowledge of Transformer architecture and decoding strategies
Familiarity with standard NLP evaluation metrics

Key Terms

Factuality Hallucination: Generated content that contradicts verifiable real-world facts (e.g., wrong dates, fabricated events)

Faithfulness Hallucination: Generated content that diverges from user instructions, provided context, or internal logical consistency, regardless of real-world factual correctness

Intrinsic Hallucination: Generated output that directly contradicts the provided source content (traditional NLG definition)

Extrinsic Hallucination: Generated output that cannot be verified from the source content (traditional NLG definition)

SFT: Supervised Fine-Tuning—training the model on labeled (instruction, response) pairs to learn to follow instructions

RLHF: Reinforcement Learning from Human Feedback—aligning the model with human preferences using a reward model and reinforcement learning

RAG: Retrieval-Augmented Generation—enhancing model generation by retrieving relevant external documents to ground the response

Softmax Bottleneck: A theoretical limitation in the final layer of language models that restricts their ability to model complex probability distributions, potentially leading to hallucinations

Entity-error hallucination: A subtype of factual contradiction where the model generates erroneous entities (e.g., wrong inventor name)

Relation-error hallucination: A subtype of factual contradiction where the model asserts incorrect relationships between entities

Unverifiability hallucination: A subtype of factual fabrication where the statement is entirely non-existent or impossible to verify

Overclaim hallucination: A subtype of factual fabrication where the model presents subjective or controversial opinions as universally valid facts