Stackelberg Game: A strategic game where a 'leader' moves first and a 'follower' moves sequentially, with the leader optimizing their choice based on the follower's anticipated best response
PPO: Proximal Policy Optimization—an RL algorithm that improves stability by clipping the probability ratio between new and old policies
Implicit Differentiation: A mathematical technique to compute gradients of the optimal solution of an inner optimization problem (follower) with respect to the outer parameters (leader)
Morphology: The physical structure of an agent, including topology, limb lengths, and joint configurations
SID: Stackelberg Implicit Differentiation—applying implicit differentiation specifically to the leader-follower dynamic in Stackelberg games
SMG: Stackelberg Markov Game—a sequential decision-making framework combining Markov Decision Processes with Stackelberg game structures
Log-derivative technique: Also known as the REINFORCE trick; a method to estimate gradients for stochastic policies or non-differentiable operations using the gradient of the log-probability