UPE: User Poly-Embedding—representing a user with multiple vectors to capture diverse interests (e.g., one vector for 'sports', one for 'cooking')
CPE: Content Poly-Embedding—representing a single item (like a news article) with multiple vectors to capture its different aspects
Poly-attention: An attention mechanism that extracts m global feature vectors from a sequence using m learnable query codes
NCE Loss: Noise Contrastive Estimation—a loss function that trains models to distinguish positive samples (real user clicks) from negative samples (non-clicks)
PLM: Pretrained Language Model—models like BERT or T5 trained on large text corpora
Mixtral: A sparse mixture-of-experts Large Language Model used here to generate ground-truth summaries for training supervision
T5: Text-to-Text Transfer Transformer—an encoder-decoder model architecture used as the backbone for EmbSum
AUC: Area Under the ROC Curve—a metric measuring the ability of the model to distinguish between clicked and non-clicked items
nDCG: Normalized Discounted Cumulative Gain—a ranking metric that gives higher weight to correct items appearing at the top of the list