← Back to Paper List

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He
Shanghai AI Laboratory
arXiv (2023)
MM Factuality RL Benchmark

📝 Paper Summary

Multimodal Large Language Models (LVLMs) Hallucination Mitigation
HA-DPO mitigates hallucinations in multimodal models by reframing the problem as preference optimization, utilizing a style-consistent dataset construction pipeline to ensure models learn factuality rather than stylistic patterns.
Core Problem
Large Vision-Language Models (LVLMs) frequently suffer from hallucinations—generating plausible but incorrect details about images—which limits their reliability in critical tasks.
Why it matters:
  • Hallucinated details (e.g., non-existent objects, wrong attributes) mislead users and can have severe consequences in fields like medical diagnostics.
  • Existing Supervised Fine-Tuning (SFT) methods require expensive high-quality annotations, while post-hoc correction methods increase inference latency and depend on external tools.
  • Standard RLHF/DPO approaches struggle with data quality and distribution shifts, where models learn to distinguish responses based on writing style rather than factual content.
Concrete Example: When asking an LVLM to describe an image, it might confidently describe a 'red car' that isn't present. Standard training might try to correct this with a human-written caption, but if the correction has a different writing style than the model's output, the model learns to mimic the style instead of correcting the hallucination.
Key Novelty
Hallucination-Aware Direct Preference Optimization (HA-DPO) with Style-Consistent Data
  • Reframes hallucination elimination as a DPO preference task where the model learns to favor non-hallucinatory outputs over hallucinatory ones without a separate reward model.
  • Introduces a 'Style-Consistent' data construction pipeline: GPT-4 rewrites both the correct (positive) and incorrect (negative) responses to share the same linguistic style, preventing the model from exploiting style shortcuts.
  • Proposes Sentence-level Hallucination Ratio (SHR), a fine-grained metric for evaluating hallucinations beyond fixed object categories.
Evaluation Highlights
  • MiniGPT-4 improved POPE accuracy from 51.13% to 86.13% (+35 absolute points) after HA-DPO training.
  • MiniGPT-4 MME score increased from 932.00 to 1326.46 (+42.32% relative improvement).
  • HA-DPO stabilizes training: unlike standard DPO where fluency degrades over time due to distribution shifts, the style-consistent approach maintains sentence fluency throughout optimization.
Breakthrough Assessment
8/10
Significant performance jumps on standard benchmarks (POPE/MME) and addresses a critical DPO failure mode (style exploitation) in multimodal settings. The automated data pipeline reduces reliance on expensive human feedback.
×