GRPO: Group Relative Policy Optimization—an RL algorithm that optimizes policies by comparing a group of outputs against each other rather than a separate critic model
MobGRPO: The paper's adaptation of GRPO for mobile agents, using trajectory-level advantages and composite rewards (efficiency + success)
SFT: Supervised Fine-Tuning—training on static datasets of expert demonstrations
LVLM: Large Vision-Language Model—a model capable of processing both images (screenshots) and text
Oracle: A powerful model (here, Qwen 2.5 VL 72B) used to evaluate whether an agent successfully completed a task, providing the reward signal
Synthetic Curriculum: A set of training tasks generated automatically rather than collected from humans
World Model: A simulator (here, text-based) that predicts the next state of the environment to check if a generated task is actually solvable