GRPO: Group Relative Policy Optimization—an RL algorithm that normalizes advantages within a group of samples for the same prompt to reduce variance without a learned value function.
DanceGRPO: A prior method applying GRPO to diffusion models using independent sequential rollouts.
SDE: Stochastic Differential Equation—a mathematical framework for modeling diffusion processes that includes random noise injection at each step.
branching factor: The number of new child trajectories spawned from a single parent state at a split step.
NFE: Number of Function Evaluations—a metric for the computational cost of generating samples.
HPS-v2.1: Human Preference Score v2.1—a reward model trained to predict human aesthetic and alignment preferences for images.
PickScore: A metric evaluating how likely a human would pick a generated image over alternatives.
KID: Kernel Inception Distance—a metric measuring the similarity between two probability distributions of images.
MMD: Maximum Mean Discrepancy—a statistical test used here to verify that branching does not distort the diversity of generated samples.