Flow Matching: A generative modeling technique that learns a velocity field to transport a simple prior distribution (noise) to a complex data distribution
GRPO: Group Relative Policy Optimization—an RL algorithm that estimates advantages by comparing multiple outputs generated from the same input, removing the need for a separate value function
SDE Sampling: Stochastic Differential Equation sampling—injects noise during generation to explore diverse trajectories
ODE Sampling: Ordinary Differential Equation sampling—a deterministic generation process used here to estimate the 'expected' outcome of an intermediate latent state
Turning Point: A denoising step where the local reward trend (slope) flips sign, specifically aligning the local direction with the overall global improvement of the trajectory
Reward Sparsity: The issue where feedback is only provided at the end of a long sequence, making it difficult for the model to learn which specific actions led to the result
Implicit Interaction: The delayed dependence where an intermediate denoising step affects not just the next state but the entire future trajectory and final outcome