UA-SID: Unified Advertisement Semantic ID—a discrete identifier for ads created by hierarchically quantizing multimodal embeddings and hashing non-semantic business signals
LazyAR: Lazy AutoRegression—a decoder architecture that processes initial layers in parallel (ignoring previous tokens) and only introduces autoregressive dependencies in later layers to speed up beam search
RSPO: Ranking-Guided Softmax Preference Optimization—a reinforcement learning algorithm that aligns model probabilities with list-wise ranking metrics (NDCG) based on ad value
VSL: Value-Aware Supervised Learning—a training objective that weights standard next-token prediction loss by the user's long-term value and interaction depth
eCPM: effective Cost Per Mille—a metric representing the revenue generated per 1,000 ad impressions, used here as the reward signal
NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for position and item relevance
SFT: Supervised Fine-Tuning—training the model on labeled data (user history) before applying reinforcement learning
Beam Search: A search algorithm that explores a graph by expanding the most promising node in a limited set
RQ-Kmeans: Residual Quantized K-means—a method to compress vectors into discrete codes by recursively clustering residuals
MTP: Multi-Token Prediction—a training technique (often auxiliary) where the model predicts multiple future tokens at once to improve representation learning
DLRM: Deep Learning Recommendation Model—the standard non-generative architecture for recommendation, typically using embedding tables and MLPs