Pelican: The proposed framework: correcting hallucination via claim decomposition and program of thought verification
LVLM: Large Vision Language Model—AI models that process both images and text to generate text outputs
Hallucination: When a model generates incorrect or non-existent visual details not present in the image
Program-of-Thought (PoT): A prompting strategy where the LLM generates executable code (like Python) to solve reasoning steps instead of just text
Grounding-DINO: An open-set object detector used to find objects specified in text prompts
YOLO: You Only Look Once—a fast, real-time object detection system used here for closed-vocabulary detection
First-order predicates: Logical structures used to decompose complex claims into atomic parts (e.g., Exists, Position, Count)
Visual Table: A structured representation (Pandas dataframe) of detected objects and their attributes used to ground the verification process
Woodpecker: A prior baseline method for visual claim verification that Pelican compares against