IDM: Inverse Dynamics Model—a model that predicts the action taken between two consecutive video frames
World Model: A generative model that simulates an environment by predicting future states (video frames) based on past states and actions
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that estimates advantages by comparing a group of outputs for the same input, removing the need for a value network
Autoregressive World Model: Generates video by predicting discrete tokens one by one (e.g., MineWorld)
Diffusion World Model: Generates video by iteratively denoising random noise, often using techniques like Diffusion Forcing (e.g., NFD)
FVD: Fréchet Video Distance—a metric for evaluating the quality and temporal coherence of generated videos
VBench: A comprehensive benchmark suite for evaluating video generation models
VQ-VAE: Vector Quantized Variational AutoEncoder—compresses images into discrete tokens
SDE: Stochastic Differential Equation—mathematical framework used to model the diffusion process