← Back to Paper List

VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck

F Zhang, Y Wu, Z Wang, X Wang, C Lv, X Huang…
School of Computer Science, Fudan University, Shanghai Key Laboratory of Intelligent Information Processing
arXiv, 1/2026 (2026)
Factuality MM

📝 Paper Summary

Hallucination suppression Internal state analysis
VIB-Probe detects hallucinations by distilling internal attention dynamics into a compact latent representation using the Information Bottleneck principle, and mitigates them by suppressing specific hallucination-sensitive attention heads.
Core Problem
Vision-Language Models (VLMs) frequently hallucinate objects or relations not present in images, and existing detectors relying on surface-level output statistics fail to capture the internal mechanistic causes of these errors.
Why it matters:
  • Hallucinations undermine trust in VLMs for high-stakes applications requiring precise visual grounding
  • Current detection methods relying on logit entropy or external tools overlook the internal attention dynamics where errors originate
  • Existing mitigation strategies often require expensive retraining or heavy external verification, lacking efficient inference-time control
Concrete Example: In an image captioning task, a VLM might generate 'a man holding a frisbee' when no frisbee exists. While output probabilities might be high due to language priors (men often hold frisbees in data), the internal attention heads responsible for visual grounding show distinct, detectable patterns of 'informational drift' that VIB-Probe captures.
Key Novelty
VIB-Probe (Variational Information Bottleneck Probe)
  • Treats the collection of all attention head outputs across layers as a high-dimensional signal containing both hallucination cues and noise
  • Applies the Information Bottleneck principle to compress this signal into a compact latent variable that maximizes prediction of hallucination labels while discarding irrelevant syntactic noise
  • Uses gradients from this trained probe to identify specific 'hallucination-sensitive' attention heads and suppresses them during inference to fix errors
Architecture
Architecture Figure Figure 2
The overall framework of VIB-Probe for detection and mitigation.
Evaluation Highlights
  • Outperforms state-of-the-art baselines on generative benchmarks like M-HalDetect by +2.84% AUROC, showing superior handling of free-form text
  • Achieves robust cross-distribution generalization, maintaining performance even when trained on one dataset (POPE-Popular) and tested on others, unlike probing baselines which degrade significantly
  • Mitigation strategy improves CHAIR metrics on COCO captioning, reducing object hallucinations more effectively than contrastive decoding methods like VCD
Breakthrough Assessment
7/10
Strong methodological contribution by applying VIB to internal states for detection. The gradient-based mitigation is clever and training-free for the base model, though it requires training the probe. Results are consistent across architectures.
×