CRL: Continual Reinforcement Learning—learning a sequence of tasks without forgetting previous ones
JAX: A Python library for high-performance numerical computing that supports Just-In-Time (JIT) compilation and automatic differentiation
JIT: Just-In-Time compilation—optimizing and compiling code into machine language at runtime for faster execution
Potential-based shaping: A method of modifying rewards using a difference of potentials (Phi(s') - Phi(s)) which guarantees the optimal policy remains unchanged
PPO: Proximal Policy Optimization—a popular reinforcement learning algorithm that stabilizes training by limiting policy updates
SI: Synaptic Intelligence—a continual learning method that penalizes changes to important model parameters to prevent forgetting
Coreset: A small, representative subset of data retained from previous tasks to approximate the full dataset distribution
Adversarial push-back: A regularization technique that forces a model to predict low confidence or a prior distribution on inputs that differ from the training data
MDP: Markov Decision Process—a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker