Flow Matching: A generative modeling framework that learns a velocity field to transform a simple base distribution (noise) into a target data distribution via an ODE.
Rectified Flow: A specific flow matching formulation that learns straight paths between data and noise, allowing for fast simulation.
Shortcut Models: A technique to distill flow models for few-step inference by enforcing consistency between multi-step and single-step velocity predictions.
PPO: Proximal Policy Optimization—an RL algorithm that improves stability by clipping the policy update to prevent large deviations.
DPPO: Diffusion Policy Policy Optimization—a prior method adapting PPO for diffusion models.
ODE: Ordinary Differential Equation—mathematical equation describing how a quantity changes continuously over time.
Trace Estimator: A stochastic method to approximate the trace of a matrix (sum of diagonal elements), often used to estimate changes in log-density in continuous normalizing flows.
Wasserstein regularization: A penalty term based on the Wasserstein distance (earth mover's distance) used to keep the fine-tuned policy close to the pre-trained behavior.