← Back to Paper List

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

W Rudman, M Golovanevsky, D Arad, Y Belinkov…
The University of Texas at Austin, Brown University, Technion - Israel Institute of Technology, University of Tübingen, Harvard University
arXiv, 1/2026 (2026)
MM Factuality Reasoning

📝 Paper Summary

Vision–Language Models (VLMs) Hallucination mitigation Mechanistic interpretability
The paper identifies specific 'PIH-heads' in Vision–Language Models that prioritize prompt text over visual evidence, showing that ablating them significantly reduces hallucinations without retraining.
Core Problem
Vision–Language Models (VLMs) often hallucinate by prioritizing incorrect information in textual prompts (e.g., mismatched object counts) over conflicting visual evidence.
Why it matters:
  • Real-world user inputs are often noisy or inaccurate, leading deployed VLMs to hallucinate rather than correct the user.
  • Prior work shows VLMs struggle to disentangle conflicting modalities, favoring text over vision, which degrades reliability in tasks like counting.
  • Current mitigation strategies often require extensive retraining or data, whereas this problem stems from internal routing mechanisms.
Concrete Example: When an image contains three waterlilies but the prompt asks to 'Describe the four waterlilies', the model hallucinates a fourth flower and describes it in detail, rather than correcting the count to three.
Key Novelty
Prompt-Induced Hallucination (PIH) Ablation
  • Identifies a small set of attention heads (PIH-heads) in the early layers of the language model component that act as conduits for copying incorrect prompt information.
  • Demonstrates that simply 'switching off' (mean-ablating) these heads stops the model from copying the prompt's error and forces it to look at the image, correcting the hallucination.
  • Shows these mechanisms generalize: heads found via object counting also fix hallucinations in color recognition tasks.
Evaluation Highlights
  • Ablating PIH-heads reduces prompt-induced hallucinations in counting tasks by up to 54%, restoring visually grounded responses.
  • In a color identification task, the same intervention reduces prompt-color copying by up to 94.25%.
  • LLaVA-OneVision shows a 4.35% improvement in baseline counting accuracy (on correct prompts) after ablation, indicating better general visual grounding.
Breakthrough Assessment
8/10
Strong mechanistic finding: identifies a specific, removable cause of a common VLM failure mode. The cross-task generalization (counting to color) without retraining suggests a fundamental architectural insight.
×