RLVR: Reinforcement Learning with Verifiable Rewards—RL where success is determined by a deterministic verifier (e.g., checking if a file exists)
GRPO: Group Relative Policy Optimization—an RL algorithm that normalizes rewards within a group of outputs for the same input, removing the need for a separate value function critic
GUI Agent: An AI agent that interacts with a computer via the Graphical User Interface (mouse, keyboard, screenshots) rather than APIs
End-to-End (E2E): A single model that maps inputs (pixels/text) directly to actions, without intermediate planners or specialized tools
Framework-based Agent: A system composed of multiple modules (planner, executor, tool user) to solve tasks, often more capable but complex
Self-rolling: The process of letting the policy execute a plan itself to generate a trajectory that is guaranteed to be within its own reachable state space
Covariate Shift: The difference in distribution between the training data (expert traces) and the data the model generates during its own operation