SR: Sequential Recommendation—predicting the next item a user will interact with based on their history
RKHS: Reproducing Kernel Hilbert Space—a space of functions where evaluation is a continuous linear functional, allowing probability distributions to be embedded as points (mean embeddings)
MMD: Maximum Mean Discrepancy—a statistical test that measures the distance between two probability distributions by comparing their mean embeddings in a kernel space
MK-MMD: Multi-Kernel Maximum Mean Discrepancy—an extension of MMD using a linear combination of multiple kernels to better capture different scales of data structure
Characteristic Kernel: A kernel (like Gaussian) whose mean embedding map is injective, ensuring that MMD=0 iff the two distributions are identical (capturing all statistical moments)
Catastrophic Forgetting: A phenomenon where a model forgets previously learned information (e.g., collaborative patterns) while learning a new task (e.g., semantic alignment)
MoE: Mixture of Experts—an architecture where different sub-models ('experts') specialize in different parts of the input space, activated by a gating network
SASRec: Self-Attentive Sequential Recommendation—a standard Transformer-based baseline model for sequential recommendation