GRPO: Group Relative Policy Optimization—an RL algorithm that optimizes policies by comparing a group of outputs for the same input, eliminating the need for a critic model
SFT: Supervised Fine-Tuning—training a model on a fixed dataset of labeled input-output pairs
MLLM: Multimodal Large Language Model—an AI model capable of processing both text and images (like screenshots)
GUI: Graphic User Interface—the visual interface of computers and phones containing icons, buttons, and text
IoU: Intersection over Union—a metric measuring the overlap between two bounding boxes, commonly used in object detection but replaced here by coordinate accuracy
OOD: Out-of-Domain—testing scenarios that differ significantly from the training data (e.g., training on Mobile, testing on Desktop)
CoT: Chain-of-Thought—a reasoning technique where the model generates intermediate steps (e.g., inside <think> tags) before the final answer