CMDP: Constrained Markov Decision Process—an RL framework where the agent maximizes reward subject to cost constraints
World Model: A learned neural network that predicts the environment's dynamics (next state, reward, cost) to allow planning without real-world interaction
RSSM: Recurrent State-Space Model—a specific type of world model architecture used in Dreamer agents that combines deterministic and stochastic components
Lagrangian method: An optimization technique that converts a constrained problem into an unconstrained one by adding a penalty term (Lagrange multiplier) for constraint violations
Safety Budget: The maximum allowable cumulative cost (e.g., number of collisions) an agent can incur
Discriminator: A network trained to distinguish between actions taken by two different policies; used here to regularize the Safe Actor to behave like the Control Actor
Imagination Rollouts: Simulated trajectories generated by the world model to estimate future values and costs without actual environment interaction