OCL: Online Continual Learning—a training regime where data arrives in a stream and each sample is processed only once (no multiple epochs).
ViT: Vision Transformer—a deep learning model for image processing based on the attention mechanism.
RwF: Routing without Forgetting—the proposed architecture using energy-based routing.
HopfieldPooling: A layer type based on Modern Hopfield Networks that compresses a sequence of inputs into a summary (or prompts) via associative retrieval.
Energy-Based Model: A framework where inference is viewed as minimizing a scalar energy function; here, the attention mechanism is the energy minimization step.
Gibbs distribution: The probability distribution that minimizes free energy; in this paper, the softmax attention weights represent this distribution.
Catastrophic Forgetting: The tendency of neural networks to drastically lose performance on previously learned tasks when trained on new ones.
LoRA: Low-Rank Adaptation—a technique to fine-tune models by updating only low-rank decomposition matrices.
Class-IL: Class-Incremental Learning—a setting where the model must distinguish between all classes seen so far without knowing the task ID.