PIH: Prompt-Induced Hallucinations—errors where a model generates output consistent with a misleading prompt rather than the visual evidence
Ablation: Selectively disabling specific components (here, attention heads) of a model to study their causal effect on behavior
Mean ablation: Replacing the output of an attention head with its average activation over a dataset, effectively neutralizing its specific signal while maintaining average statistics
Discrepancy distance: The magnitude of the difference between the ground truth (e.g., 3 objects) and the prompt's claim (e.g., 5 objects)
Sycophancy: The tendency of a model to agree with or conform to the user's input/bias, even when that input is incorrect
LLaVA-OneVision: A state-of-the-art open-source Vision–Language Model family
Attention head: A sub-component of the Transformer architecture that learns to focus on different parts of the input sequence
Format copying: A behavior where the model outputs the correct answer but mimics the stylistic format (e.g., digit vs. word) of the incorrect prompt