VSD: Variational Score Distillation—a method where a student model minimizes the divergence from a teacher's distribution using a learned score function
CD: Consistency Distillation—a technique enforcing that model predictions at different timesteps map to the same initial data point
LRM: Latent Reward Model—a compact proxy network trained to predict reward values directly from latent representations, bypassing the decoder
DiT: Diffusion Transformer—a diffusion model architecture using Transformers instead of U-Nets, scalable for video
HPSv2: Human Preference Score v2—a reward model predicting human aesthetic preference for images/videos
VBench: A comprehensive benchmark for evaluating video generation across dimensions like temporal consistency and imaging quality
CFG: Classifier-Free Guidance—a technique to improve prompt alignment by extrapolating between conditional and unconditional model predictions
NFE: Number of Function Evaluations—the number of times the neural network is called during inference (sampling steps)