PLM: Pretrained Language Model—models like BERT or RoBERTa trained on massive text corpora to understand language semantics
Poly-attention: Also known as codebook-based attention; a mechanism that uses a set of learnable query vectors (codes) to extract multiple distinct representations from a sequence
LLM: Large Language Model—generative models like Llama-2 used here to summarize user history
AUC: Area Under the ROC Curve—a metric measuring the ability of a classifier to distinguish between positive (clicked) and negative (not clicked) items
NCE loss: Noise Contrastive Estimation loss—a training objective that teaches the model to distinguish the true target item from randomly sampled negative items
Session-based encoding: Breaking a long sequence into smaller chunks (sessions) to be encoded separately, reducing computational complexity
Attention Sparsity: Restricting the attention mechanism to look only at specific tokens (e.g., neighbors or global markers) rather than all tokens, reducing compute cost
nDCG: Normalized Discounted Cumulative Gain—a ranking metric that gives more credit for correctly ranking highly relevant items at the top of the list