MLLM: Multimodal Large Language Model—AI models capable of processing and generating both text and images
Hallucination: A phenomenon where models generate plausible textual responses that contradict the visual content of the image
MCQ: Multiple-Choice Question—a format used here to evaluate models by asking them to select the correct option from a list
COT: Chain-Of-Thought—a prompting technique encouraging models to reason step-by-step before answering
LongHallGen: The authors' proposed automated pipeline for generating long-context hallucination data using GPT-4V
GPT-4V: GPT-4 with Vision capabilities—a strong proprietary MLLM used here for data generation
Object-level Description: Text describing specific attributes, states, or relations of a single object
Image-level Description: Text covering the main content, background, and details of an entire image in a paragraph
Multi-round Conversation: Simulated dialogue between a user and an assistant about the image content