Reasoning Segmentation: Generating pixel-wise masks for objects based on complex, implicit, or logical text queries rather than simple class labels
GRPO: Group Relative Policy Optimization—an RL algorithm that optimizes policies based on the relative performance of a group of outputs for the same input, often used without a separate value function
Chain-of-Thought (CoT): A prompting or training technique where the model generates intermediate reasoning steps before the final answer
RefCOCOg: A large-scale dataset for referring expression segmentation containing images and natural language descriptions of objects
SFT: Supervised Fine-Tuning—training a model on labeled input-output pairs
IoU: Intersection over Union—a metric measuring the overlap between the predicted mask/box and the ground truth
OOD: Out-of-Distribution—data samples that differ significantly from the training data distribution
L1 Distance: The sum of absolute differences between coordinates, used here to measure how close predicted points/boxes are to ground truth targets