RSM: Reweighted Score Matching—a generalized loss function for training diffusion models where samples are weighted by their importance relative to a target distribution
DPMD: Diffusion Policy Mirror Descent—an algorithm applying RSM to solve the Policy Mirror Descent optimization problem
SDAC: Soft Diffusion Actor-Critic—an algorithm applying RSM to solve the Max-Entropy RL problem
DSM: Denoising Score Matching—the standard objective for training diffusion models, matching the score of a noise-perturbed data distribution
EBM: Energy-Based Model—a probabilistic model defined by an unnormalized density function (energy function), often requiring MCMC for sampling
SAC: Soft Actor-Critic—a standard RL algorithm that maximizes expected return plus policy entropy, typically using Gaussian policies
Policy Mirror Descent: An iterative policy optimization method that keeps the new policy close to the old one using a KL-divergence constraint (trust region)
Q-function: A function estimating the expected future reward for taking a specific action in a specific state