VLM: Vision-Language Model—an AI model trained to understand and process both text and images simultaneously
ReAct: Reasoning + Acting—a prompting paradigm where agents generate reasoning traces before executing actions (like searching the web)
Browsing Agent: An AI system equipped with tools (browser, search engine) to autonomously navigate the internet to answer questions
Irreducible Reasoning Checklist: A sequential list of minimal necessary steps (search queries, page visits, visual verifications) required to derive the correct answer, used to verify the reasoning process
OA: Overall Accuracy—the percentage of questions where the final answer matches the ground truth
SA: Strict Accuracy—the percentage of questions where the model gets the correct answer AND successfully completes all steps in the reasoning checklist
AVG CS: Average Checklist Score—the average percentage of checklist items completed across all questions
Captioning tool: A separate AI module that converts an image into a text description, often used by text-only agents to 'see' images
Hallucination: When an AI model generates plausible-sounding but factually incorrect information