← Back to Paper List

An Audit on the Perspectives and Challenges of Hallucinations in NLP

Pranav Narayanan Venkit, Tatiana Chakravorti, Vipul Gupta, Heidi Biggs, Mukund Srinath, Koustava Goswami, Sarah Rajtmajer, Shomir Wilson
College of Information Sciences and Technology, Pennsylvania State University, Adobe Research, School of Interactive Computing, Georgia Institute of Technology
arXiv (2024)
Factuality Benchmark MM

📝 Paper Summary

Definition and Taxonomy of Hallucination Metrics and Evaluation
An audit of 103 NLP papers and a survey of 171 practitioners reveal a lack of consensus on definitions and metrics for LLM hallucinations, with significant divergence between academic frameworks and real-world perceptions.
Core Problem
The term 'hallucination' is used inconsistently across NLP research without a unified definition or measurement framework, leading to fragmented understanding and mismatched mitigation strategies.
Why it matters:
  • 57.3% of audited papers discussing hallucination do not even define the term, creating ambiguity in research goals
  • The lack of consensus risks misappropriating the term across diverse contexts like image captioning vs. text generation
  • Practitioners and researchers disagree on terminology, with many preferring terms like 'fabrication' or 'confabulation' to avoid anthropomorphism
Concrete Example: One paper defines hallucination as 'nonsensical' output, while another defines it as 'plausible but unfaithful'. A practitioner might view a creative story generation as a feature, while a medical researcher views the same 'hallucination' as a critical failure.
Key Novelty
Dual-Method Audit of Hallucination Conceptualization
  • Systematically audits 103 peer-reviewed NLP publications to categorize how hallucination is defined (or not) and measured (statistical vs. data-driven vs. human)
  • Conducts a survey of 171 NLP/AI practitioners to contrast academic definitions with real-world perceptions, revealing a preference for terms like 'fabrication' and recognizing creative uses of hallucination
Evaluation Highlights
  • Only 42.7% of the 103 audited papers explicitly define 'hallucination', with the majority (57.3%) providing no definition despite focusing on the topic
  • 40.46% of surveyed practitioners prefer the term 'Fabrication' over 'Hallucination' to describe the phenomenon, citing the latter's improper anthropomorphism
  • 92% of survey respondents view hallucination as a weakness, yet ~12% identify positive correlations with creativity in tasks like storytelling
Breakthrough Assessment
7/10
Provides critical meta-analysis rather than a new model. Highlights significant methodological flaws in the field (undefined terms, inconsistent metrics) that hinder progress.
×