DPO: Direct Preference Optimization—a method to align language models to preferences by optimizing a policy to satisfy a preference ranking without explicitly training a reward model
POPE: Polling on Object Existence—a benchmark for evaluating object hallucination in LVLMs by asking yes/no questions about object presence
MME: Multimodal Evaluation—a comprehensive benchmark for evaluating LVLM performance across various tasks
Hallucination: The phenomenon where a model generates content (objects, attributes, relationships) that does not exist in the source image
Style Consistency: Ensuring that positive and negative training samples in a preference dataset share the same linguistic patterns (length, tone, vocabulary) so the model optimizes for content, not style
Visual Genome: A large-scale dataset with detailed image annotations (objects, attributes, relationships) used here as ground truth for hallucination detection
SHR: Sentence-level Hallucination Ratio—a metric proposed in this paper to quantify hallucinations at the sentence level rather than just object existence