REC: Referring Expression Comprehension—locating objects in an image based on a natural language description
CoT: Chain-of-Thought—a prompting or training technique where models generate intermediate reasoning steps before the final answer
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that optimizes policies by comparing a group of outputs generated for the same input, removing the need for a separate critic model
SFT: Supervised Fine-Tuning—training a model on labeled examples (here, reasoning traces) to establish a baseline capability
IoU: Intersection over Union—a metric measuring the overlap between a predicted bounding box and the ground truth box
Hallucination: In this context, when a model predicts an object exists and outputs a box for it, even though the object described does not exist in the image
Box Hints: Pre-detected bounding boxes provided to the model as visual prompts (e.g., with numbered markers) to ground the reasoning process
Cold Start: The initial SFT phase used to teach the model the desired output format before applying reinforcement learning