RLVR: Reinforcement Learning with Verifiable Rewards—using binary success signals to train agents via reinforcement learning
GUI Agent: An AI agent that interacts with a graphical user interface (e.g., clicking buttons, typing) to perform tasks
Partial State Observability: The condition where an agent (or verifier) cannot see the full state of the system (e.g., hidden files, memory) through visual screenshots alone
Agentic Interactive Verification: A paradigm where the evaluator is an agent capable of executing actions to verify task completion, rather than just a passive observer
Latent State: System properties not visible on the screen, such as file permissions, background processes, or file content not currently open
Rejection Sampling: A technique where multiple solutions are generated, and a verifier selects the best one to submit
Best-of-N: An inference strategy where N trajectories are generated, and the one with the highest reward model score is selected