GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that estimates advantages by comparing a group of outputs generated from the same input, removing the need for a separate value network
Flow Matching: A generative modeling framework that learns a velocity field to transform noise into data via a deterministic Ordinary Differential Equation (ODE)
ODE: Ordinary Differential Equation—a deterministic equation describing how a state changes over time; in flow matching, it maps noise to images deterministically
SDE: Stochastic Differential Equation—a differential equation that includes a random noise term, allowing for probabilistic trajectories
GenEval: A benchmark for evaluating compositional image generation capabilities, such as object counting, spatial relations, and color binding
PickScore: A reward model trained on human preferences to predict which of two images better matches a text prompt
DPO: Direct Preference Optimization—an offline method to align models using preference pairs without explicit reward modeling
KL divergence: Kullback-Leibler divergence—a measure of how one probability distribution differs from another; used here as a penalty to prevent the model from drifting too far from its pre-trained state
Euler-Maruyama: A method for approximating the numerical solution of a Stochastic Differential Equation (SDE)