Context Channel Capacity ($C_{ctx}$): The maximum mutual information between a CL architecture's context signal and the parameters it uses for prediction; determines the upper bound on task retention
HyperNetwork: A neural network that generates the weights for another network (the target network) based on an input (context) embedding
Impossibility Triangle: Theorem stating that zero forgetting, online learning, and finite parameters cannot be simultaneously satisfied by sequential state-based learners
P5 (Wrong-Context Probing): A diagnostic protocol where a model is evaluated with deliberately incorrect context signals; high accuracy drop indicates the model actually uses context
Task Identity Entropy ($H(T)$): The amount of information required to uniquely identify the current task ($log_2 K$ for $K$ equiprobable tasks)
Catastrophic Forgetting: The abrupt loss of previously acquired knowledge when a neural network learns new tasks sequentially
Paradigm A (State Protection): Methods like EWC/SI that try to protect specific parameter values from changing; proven to have $C_{ctx}=0$
Paradigm C (Conditional Regeneration): Methods like HyperNetworks that generate parameters fresh from context; only paradigm capable of zero forgetting