JSBSim: an open-source, non-linear flight dynamics model that simulates realistic aircraft physics (6 degrees of freedom)
SPO: Simple Policy Optimization—a recent policy gradient algorithm that uses a probability ratio in the loss but relies on KL-divergence regularization rather than PPO's clipping
MA-SPO: Multi-Agent Simple Policy Optimization—the authors' adaptation of SPO to multi-agent settings using centralized critics
CTDE: Centralized Training Decentralized Execution—a paradigm where agents train with access to global info (critic) but act using only local info (actor)
HMARL: Hierarchical Multi-Agent Reinforcement Learning—structuring agents into layers, typically a manager (high-level) and workers (low-level)
League-Play: A training mechanism where agents play against a mixed population of past versions or diverse strategies to prevent overfitting to a single opponent
POSMG: Partially Observable Semi-Markov Game—a game theoretic model where actions (options) can last for variable amounts of time
PPO: Proximal Policy Optimization—a standard RL algorithm that prevents large policy updates via clipping
SAC: Soft Actor-Critic—an off-policy RL algorithm that maximizes entropy alongside expected return
WEZ: Weapon Engagement Zone—the geometric area relative to an aircraft where its weapons can effectively hit a target