VADER: Video Alignment via DifferEntiable Rewards—the proposed method for aligning video diffusion models using reward gradients.
Reward Gradient: The gradient of the reward function with respect to the generated data (pixels), which is then backpropagated to the model weights.
DDPO: Denoising Diffusion Policy Optimization—a reinforcement learning method for diffusion models that uses policy gradients (treating reward as a black box).
DPO: Direct Preference Optimization—a method usually for language models, adapted here for diffusion, optimizing preferences without an explicit reward model loop.
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices.
Truncated Backpropagation: A training technique where gradients are only propagated through a small number of recent steps (often just 1) rather than the full generation history, saving memory.
VideoMAE: Video Masked Autoencoder—a model used here as a reward function to classify actions in generated videos.
V-JEPA: Video Joint-Embedding Predictive Architecture—a self-supervised video model used here to score temporal consistency.