MBRL: Model-Based Reinforcement Learning—learning a dynamics model of the environment to simulate experience (rollouts) for training a policy
Offline RL: Training RL agents using only a fixed, previously collected dataset without further interaction with the environment
Epistemic Uncertainty: Uncertainty arising from a lack of knowledge or data (model ignorance), which can be reduced with more data
Aleatoric Uncertainty: Uncertainty arising from inherent stochasticity or noise in the environment, which cannot be reduced by more data
RWM: Robotic World Model—a neural network that predicts future observations autoregressively
MOPO: Model-based Offline Policy Optimization—a framework that penalizes rewards by the estimated model uncertainty
PPO: Proximal Policy Optimization—a stable, on-policy gradient method for optimizing neural network policies
POMDP: Partially Observable Markov Decision Process—an environment where the agent cannot see the full state
Bootstrap Ensemble: Training multiple independent models on the same data to estimate uncertainty via the variance of their predictions