GRPO: Group Relative Policy Optimization—an RL algorithm that optimizes policies based on the relative performance of a group of outputs rather than a value function
RLVR: Reinforcement Learning with Verifiable Rewards—a training paradigm where model outputs are scored based on objective verification (e.g., correct answer, successful code execution)
SVG: Scalable Vector Graphics—an XML-based vector image format that can be generated via code
TTRL: Test-Time Reinforcement Learning—using the model's own consistency (majority vote) during inference/generation as a proxy for correctness when ground truth is missing
Goldilocks principle: A reward strategy that incentivizes generating tasks of intermediate difficulty (not too hard, not too easy) to maximize learning signal
Proposer: The agent role responsible for formulating visual concepts and questions
Coder: The agent role responsible for translating concepts into executable code to render images
Solver: The agent role responsible for reasoning over the rendered images to answer questions