Monoped: A one-legged robot used as a canonical system to study hopping and dynamic locomotion
Proprioceptive: Sensing internal state (joint angles, velocities) rather than external state (cameras, lidar, contact sensors)
PD Controller: Proportional-Derivative controller—a feedback loop that drives error to zero; often used in RL as an intermediate layer, but avoided here in favor of direct torque
SAC: Soft Actor-Critic—an off-policy RL algorithm that maximizes expected reward and policy entropy for robust exploration
CMA-ES: Covariance Matrix Adaptation Evolution Strategy—a derivative-free optimization algorithm used here to tune simulation parameters to match reality
Energy Shaping: A control strategy that regulates the total energy (kinetic + potential) of a system to achieve a desired behavior (like hopping height)
gSDE: Generalized State Dependent Exploration—an exploration noise strategy where noise is a function of state, leading to smoother actions than independent step noise