DAgger: Dataset Aggregation—an iterative imitation learning algorithm where the student policy collects its own data, which is then labeled by the expert
PPO: Proximal Policy Gradient—a reinforcement learning algorithm that optimizes policies using a clipped objective function to ensure stable updates
TCN: Temporal Convolutional Network—a neural network architecture that uses 1D convolutions over a time sequence to capture temporal history
Asymmetric Critic: An RL architecture where the critic (value estimator) has access to privileged information (e.g., exact states) that the actor (policy) does not see
Covariate Shift: A situation where the distribution of input data during testing differs from training (e.g., a drone drifting to positions not seen in expert demonstrations)
Sim-to-Real: Transferring a policy trained in simulation to a physical robot
BEM model: Blade Element Momentum theory—a physics model used for accurate aerodynamic simulation of propellers
Privileged Information: Exact state data (position, velocity) available in simulation but not to the vision-based robot during deployment
Catastrophic Forgetting: A phenomenon where a neural network abruptly loses previously learned knowledge when trained on new data