ReAct: Reasoning and Acting—a paradigm where LLMs generate reasoning traces and task-specific actions (tool calls) in an interleaved manner
CXR: Chest X-ray—a projection radiograph of the chest used to diagnose conditions affecting the chest, its contents, and nearby structures
VQA: Visual Question Answering—the task of answering natural language questions based on the visual content of an image
Grounding: The process of linking textual concepts (e.g., 'nodule') to specific regions or bounding boxes in an image
LangGraph: A library for building stateful, multi-actor applications with LLMs, used here to manage the agent's reasoning loop
DICOM: Digital Imaging and Communications in Medicine—the international standard for medical images and related information
LMM: Large Multimodal Model—a model capable of processing and generating multiple modalities (e.g., text and images)
Zero-shot: The ability of a model to perform a task without having explicitly seen examples of that specific task during training