CPT: Continual Pre-training—further training a base LLM on domain-specific data to adapt it to a new task
User Interaction History (UIH): A sequence of items a user has interacted with, used to predict future interests
Collaborative Filtering (CF): Recommendation method based on patterns of interactions (e.g., users who bought X also bought Y)
Power-law scaling: A mathematical relationship where model performance improves at a fixed rate (exponent alpha) as resources (data/compute) increase exponentially
Recall@K: Evaluation metric measuring if the true target item appears in the top K recommendations
Scaling exponent (alpha): The rate at which loss decreases as dataset size increases; higher alpha means faster learning
L_inf: Irreducible loss; the theoretical best performance a model can achieve with infinite data
Asymptotic loss: The theoretical minimum loss a model approaches as training data becomes infinite
Perplexity: A measurement of how well a probability model predicts a sample; lower is better