← Back to Paper List

Suppressing VLM Hallucinations with Spectral Representation Filtering

Ameen Ali, Tamim Zoabi, Lior Wolf
Tel Aviv University
arXiv (2025)
MM Factuality Benchmark

📝 Paper Summary

Vision-Language Models (VLMs) Hallucination Mitigation Mechanistic Interpretability
SRF identifies specific directions in a model's feature space that cause hallucinations and suppresses them using a mathematical filter applied directly to the model's weights without retraining.
Core Problem
Vision-language models frequently fabricate objects or attributes not present in images due to over-reliance on language priors and statistical biases.
Why it matters:
  • Hallucinations compromise reliability in safety-critical applications requiring accurate visual interpretation
  • Current mitigation methods like decoding adjustments (e.g., beam search) or post-hoc editing create substantial inference overhead (5-10x slowdown)
  • Retraining-based solutions require expensive data curation and computational resources
Concrete Example: When describing a grayscale image, a VLM might hallucinate that it is 'vibrant' due to linguistic priors. SRF detects the internal activation pattern responsible for this bias and dampens it, restoring a factual description.
Key Novelty
Spectral Representation Filtering (SRF)
  • Treats hallucination as a signal processing problem: analyzes the covariance of differences between truthful and hallucinatory internal states to find 'hallucination modes' (directions of high variance)
  • Applies a soft spectral filter to the feed-forward network weights, damping these specific modes to equalize feature variance without removing semantic content
  • Operates entirely post-hoc (after training) and pre-inference (modifies weights once), resulting in zero runtime cost
Evaluation Highlights
  • Achieves state-of-the-art faithfulness on MSCOCO, POPE, and A-OKVQA benchmarks across three model families (LLaVA-1.5, MiniGPT-4, mPLUG-Owl2)
  • Incurs zero inference latency overhead compared to decoding-based baselines like VCD which slow down generation
  • Consistently reduces hallucination rates (e.g., lower CHAIR scores) without degrading caption quality or detail
Breakthrough Assessment
8/10
Offers a mathematically elegant, training-free solution to a major VLM problem with zero inference cost. It surpasses heavy decoding-time methods, making it highly practical for deployment.
×