LoRA: Low-Rank Adaptation—a technique to fine-tune large models by training only small low-rank matrices while keeping the base model frozen
Semantic ID: A method of representing items as sequences of tokens derived from their semantic features (e.g., title/description), often using a hierarchical tokenizer
Proximal Regularizer: A penalty term in the loss function that keeps the new model parameters close to the previous version to prevent drastic changes (forgetting)
Cumulative LoRA: A family of methods that freeze past adapters and sum them with a new trainable adapter; popular in vision but shown here to be harmful for recommendation
Plasticity: The ability of the model to learn new patterns (e.g., a user's new interest in cooking)
Stability: The ability of the model to retain useful old patterns (e.g., a user's long-term interest in sci-fi)
Softmax-KL Proximal: A specific regularization term using KL divergence between the softmax outputs of the adapter weights, preserving the structural distribution of the weights