Flow Matching: A generative modeling framework that learns a velocity field to transform a simple noise distribution into a complex data distribution via ODEs
GRPO: Group Relative Policy Optimization—an RL algorithm that estimates advantages by comparing a group of outputs for the same input, removing the need for a learned value critic
ODE: Ordinary Differential Equation—determines the deterministic path of sample generation in standard flow matching
SDE: Stochastic Differential Equation—adds noise to the generation process, providing the exploration needed for RL
Credit Assignment: The problem of determining which specific action or step in a sequence is responsible for the final reward
Reward Hacking: When an RL agent exploits flaws in the reward model to get high scores without actually improving performance (e.g., oversaturated colors)
Mode Collapse: A failure mode where the generative model loses diversity and produces very similar outputs for different inputs
ELBO: Evidence Lower Bound—a proxy objective used in some RL formulations to approximate the log-likelihood of the data
GenEval: A benchmark for evaluating text-to-image models on compositional and text rendering capabilities
PickScore: A human-preference-based metric for evaluating the alignment of generated images with text prompts