RSSM: Recurrent State-Space Model—a latent dynamics model used in Dreamer that combines deterministic and stochastic components to predict future states
ELBO: Evidence Lower Bound—a variational objective function used to train generative models by maximizing the lower bound of the data likelihood
residual action: The incremental change (delta) added to the previous action to obtain the current action, rather than predicting the current action from scratch
ODL: Observation Difference Encoder—a neural network component proposed in this paper that encodes the pixel-level difference between two consecutive frames
POMDP: Partially Observable Markov Decision Process—a mathematical framework for decision-making where the agent cannot directly observe the full state of the environment
MBRL: Model-Based Reinforcement Learning—an approach where the agent learns a model of the environment's dynamics to plan or improve its policy
Dreamer: A state-of-the-art model-based RL algorithm that learns latent dynamics from images and optimizes policies via imagination in the latent space
TD-MPC: Temporal Difference Model Predictive Control—a hybrid model-based/model-free algorithm that plans actions in a learned latent space
DMControl: DeepMind Control Suite—a standard benchmark for continuous control tasks involving physics simulation
SAC: Soft Actor-Critic—a popular model-free RL algorithm for continuous control that maximizes a trade-off between expected return and entropy
KL divergence: Kullback-Leibler divergence—a statistical distance measure used here to regularize the residual actions towards a zero-mean Gaussian prior