A comprehensive taxonomy of hallucinations in Large Language Models

📝 Paper Summary

Hallucination Taxonomy Theoretical limits of LLMs Hallucination Mitigation

This report establishes a formal taxonomy for LLM hallucinations, proving their theoretical inevitability in computable models while categorizing their manifestations, causes, and mitigation strategies.

Core Problem

LLMs frequently generate plausible but factually incorrect or fabricated content, posing risks to reliability in critical applications like healthcare and law.

Why it matters:

Hallucinations in safety-critical domains (medical, legal) can lead to severe consequences, such as misinforming patients or citing non-existent court cases
Current definitions are inconsistent (e.g., intrinsic vs. extrinsic), hindering comparative research and the development of unified mitigation strategies
Users often over-rely on confident but incorrect AI outputs due to cognitive biases like automation bias and the fluency heuristic

Concrete Example: An LLM summarizing an article stating the FDA approved an Ebola vaccine in 2019 might intrinsically hallucinate by claiming the FDA rejected it, or extrinsically hallucinate by inventing a claim that the Parisian Tiger was hunted to extinction in 1885.

Key Novelty

Theoretical Inevitability of Hallucination

Posits that hallucination is not merely a bug but an innate limitation of computable LLMs, proven via diagonalization in computability theory
Demonstrates that for any computable LLM, there exists a ground truth function where the model will hallucinate on an infinite number of inputs
Argues that self-elimination of hallucination is impossible, necessitating external aids like RAG or human oversight

Evaluation Highlights

Identifies that logical inconsistencies account for 19% of hallucination cases in surveyed analyses
Notes that temporal disorientation (errors with time-sensitive info) accounts for 12% of identified hallucination cases
Reports that ethical violations (harmful/defamatory content) represent 6% of hallucination cases

Breakthrough Assessment

7/10

While primarily a survey and taxonomy, the paper significantly strengthens the theoretical foundation by formalizing the 'inevitability' argument based on computability theory, shifting the field's focus from elimination to mitigation.

⚙️ Technical Details

Problem Definition

Setting: Formal world of computable functions where an LLM h attempts to approximate a ground truth function f

Inputs: Input string s from the set of all finite-length strings S

Outputs: Output string h(s) which should match f(s)

Pipeline Flow

Taxonomy Definition (Intrinsic/Extrinsic, Factuality/Faithfulness)
Theoretical Framework (Computability Proofs)
Causal Analysis (Data, Model, Prompt)
Mitigation Survey (Architectural, Systemic)

System Modules

Theoretical Framework

Prove inevitability of hallucinations

Model or implementation: Formal mathematical logic (Computability Theory)

Novel Architectural Elements

Does not propose a new architecture but synthesizes existing ones (Toolformer, RAG) into a mitigation taxonomy

Comparison to Prior Work

vs. Standard Surveys: This paper adds a formal proof of inevitability based on computability theory, unlike purely empirical surveys
vs. Medical-specific reviews: This taxonomy is domain-agnostic, covering code, multimodal, and general text, rather than just medical hallucinations
vs. Ji et al. (Survey of Hallucination in NLG) [not cited in paper]: Similar breadth, but this paper places stronger emphasis on the 'computable function' theoretical limit

Limitations

The theoretical proof relies on the assumption of 'computable LLMs' and strict ground truth functions, which may not perfectly map to all probabilistic nuances of modern deep learning
Lack of a unified, universally accepted definition of hallucination across the field continues to hinder standardized benchmarking
Does not propose a novel mitigation algorithm, only surveys existing ones

Reproducibility

This is a survey/theoretical paper. No specific code or model weights are associated with it, though it references many external benchmarks (TruthfulQA, HalluLens, FActScore).

📊 Experiments & Results

Evaluation Setup

Survey of existing benchmarks and empirical studies on hallucination rates

Benchmarks:

TruthfulQA (Open-domain QA (adversarial))
HalluLens (Multi-task hallucination detection)
FActScore (Summarization factuality evaluation)

Metrics:

Hallucination Rate (%)
Factuality Score
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The paper aggregates statistics from various studies to quantify the prevalence of different hallucination types.
Aggregated Studies	Percentage of cases	100	19	N/A
Aggregated Studies	Percentage of cases	100	12	N/A
Aggregated Studies	Percentage of cases	100	6	N/A

Main Takeaways

Hallucination is theoretically inevitable for computable LLMs, meaning the goal must be mitigation and management rather than complete elimination
Different types of hallucinations (intrinsic vs. extrinsic) require distinct detection and mitigation strategies (e.g., reasoning checks vs. RAG)
Human cognitive biases (automation bias, fluency heuristic) significantly exacerbate the risks of hallucination, necessitating interface-level interventions like confidence displays
Causes are multifaceted: data quality (static, noisy), model architecture (auto-regressive, lack of reasoning), and user prompts (adversarial inputs) all contribute

📚 Prerequisite Knowledge

Prerequisites

Basic understanding of Large Language Models and their auto-regressive nature
Computability theory (specifically diagonalization and computable functions)
Familiarity with NLP evaluation metrics

Key Terms

computable LLM: An LLM defined as a computable function within a formal mathematical framework, subject to the limits of computability theory

diagonalization: A mathematical proof technique used to show that certain sets are larger than others, used here to prove there are always inputs where the LLM fails

intrinsic hallucination: Generated content that contradicts the provided input context (e.g., a summary contradicting the source text)

extrinsic hallucination: Generated content that cannot be verified from the source text and contradicts real-world knowledge or training data

RAG: Retrieval-Augmented Generation—a technique that grounds LLM responses in external documents to reduce hallucinations

fluency heuristic: A cognitive bias where users judge the accuracy of information based on how grammatically correct and smooth the text appears

automation bias: The tendency for humans to over-rely on automated systems and accept their outputs as correct, even when they are not

Toolformer: An architectural approach where LLMs are trained to use external tools (calculators, APIs) to improve accuracy

FActScore: A benchmark metric that evaluates factual consistency in summarization by breaking sentences into atomic facts

HalluLens: A benchmark systematically mapping hallucinations to a taxonomy including factual, ethical, logical, and temporal dimensions

TruthfulQA: A benchmark of adversarially constructed questions designed to test whether models mimic human falsehoods or misconceptions