FPC: Forgetting of Pre-trained Capabilities—the phenomenon where an RL agent loses skills learned during pre-training because it doesn't visit the relevant states early in fine-tuning
State Coverage Gap: A specific instance of FPC where the agent operates in 'Close' states (start of task) and forgets how to act in 'Far' states (later in task) before reaching them
Imperfect Cloning Gap: An instance of FPC where slight differences between the pre-trained model and the optimal policy lead to distribution shift and subsequent forgetting
EWC: Elastic Weight Consolidation—a regularization method that penalizes changes to important network parameters (identified by the Fisher information matrix) to prevent forgetting
BC: Behavioral Cloning—in this context, an auxiliary loss that forces the policy to stay close to the pre-trained policy's output on a set of replay buffer states
KS: Kickstarting—a distillation method where the student policy is regularized to stay close to the teacher (pre-trained) policy on states visited by the student
APPO: Asynchronous Proximal Policy Optimization—an efficient, distributed version of the PPO reinforcement learning algorithm
SAC: Soft Actor-Critic—an off-policy RL algorithm that maximizes a trade-off between expected return and entropy
RND: Random Network Distillation—an exploration bonus method that encourages agents to visit unfamiliar states by predicting the output of a fixed random network