← Back to Paper List

Sources of Hallucination by Large Language Models on Inference Tasks

Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman
University of Edinburgh, Google Research, Macquarie University
Conference on Empirical Methods in Natural Language Processing (2023)
Factuality Reasoning Benchmark Pretraining

📝 Paper Summary

Hallucination suppression Knowledge internalization
LLMs hallucinate in inference tasks because they rely on memorized training sentences and simple corpus frequency heuristics rather than performing robust logical reasoning.
Core Problem
Large Language Models often claim false entailments (hallucinations) in natural language inference tasks, but the specific causes rooted in pre-training data distribution are opaque.
Why it matters:
  • If models rely on memory rather than reasoning, they will fail on novel inputs or private data (e.g., legal docs) that contradict pre-training data
  • Understanding specific biases allows for better evaluation controls and helps explain why models present incorrect information as fact in downstream tasks like QA
  • Current trust in LLMs for logical reasoning may be misplaced if performance stems from shallow heuristics
Concrete Example: If an LLM has memorized 'Whiskey consists chiefly of alcohol', it might incorrectly claim that 'Whiskey contains alcohol' entails 'Whiskey consists chiefly of alcohol' just because the premise is less frequent in the corpus than the hypothesis, or because the hypothesis is a memorized sentence.
Key Novelty
Attestation and Frequency Bias Probe
  • Identifies 'Attestation Bias': Models are more likely to label a relationship as 'Entailment' if the hypothesis sentence appears verbatim in their pre-training data
  • Identifies 'Relative Frequency Bias': Models default to 'Entailment' if the premise event is less frequent in general text than the hypothesis event, mimicking a 'specific-to-general' heuristic
  • Demonstrates that named entities act as 'indices' for memory recall; replacing entities with generic types or rare names breaks the model's reliance on memorization
Evaluation Highlights
  • GPT-3.5 is 2.2x more likely to wrongly predict Entailment on random premises if the hypothesis is attested in training memory
  • Performance drops massively when biases contradict labels: LLaMA-65B falls from 65.5% AUC (consistent) to 8.1% (adversarial) on Attestation bias samples
  • GPT-3.5 recall drops from 92.3% to 55.3% when entities in the Levy/Holt dataset are replaced with frequent random entities, proving reliance on specific entity memorization
Breakthrough Assessment
7/10
Strong behavioral analysis paper that effectively isolates specific mechanisms (memory and frequency) causing hallucinations. It doesn't propose a new architecture but provides crucial diagnostic insights for the field.
×