RSSM: Recurrent State-Space Model—a probabilistic model that splits latent states into deterministic (memory) and stochastic (uncertainty) components to predict future sequences.
DreamerV3: A state-of-the-art model-based RL algorithm that learns a world model from data and trains a policy inside the model's 'imagined' environment.
PPO: Proximal Policy Optimization—a popular model-free RL algorithm that updates policies carefully to avoid performance collapse, used here as a baseline.
symlog: A function f(x) = sign(x) * ln(|x| + 1) used to compress large value ranges (like rewards or pixel gradients) to make training more stable.
GAE: Generalized Advantage Estimation—a method to estimate the 'advantage' (how good an action was) by balancing bias and variance.
POMDP: Partially Observable Markov Decision Process—a mathematical framework for decision-making where the agent cannot see the full state of the world.
KL divergence: A statistical distance measure used to keep the learned posterior distribution close to a prior distribution, regularizing the latent space.
Action repeat: Holding the same action for k consecutive simulation steps to reduce the decision frequency and smooth control.