VLM: Vision-Language Model—an AI system that processes both images and text to generate descriptions or answer questions
Object Hallucination (OH): The generation of text describing objects, attributes, or relations that are not actually present in the input image
Covariance Matrix: A matrix representing how different variables (here, dimensions of the feature vector) change together; used to capture the shape of the data distribution
Eigendecomposition: Factorizing a matrix into its eigenvectors (directions) and eigenvalues (magnitude of variance in those directions)
Spectral Filtering: A technique from signal processing that modifies a signal by amplifying or attenuating specific frequency components (here, variance directions)
FFN: Feed-Forward Network—a component within Transformer layers that processes information independently at each token position
CHAIR: Caption Hallucination Assessment with Image Relevance—a metric measuring the percentage of hallucinated objects in generated captions
POPE: Poll-based Object Probing Evaluation—a benchmark asking Yes/No questions about object presence to test for hallucinations
VCD: Visual Contrastive Decoding—a baseline method that contrasts outputs from original vs. distorted images to reduce hallucinations