RLSO: Reinforcement Learning for System Optimization—the proposed algorithm that optimizes the LLM based on feedback (rewards) from the recommendation system
Profile Encoder: An LLM that maps user interaction history to a natural language profile
Recommender Decoder: A model (often embedding-based) that takes a user profile and item metadata to generate a ranked list of items
Mxbai: A state-of-the-art text embedding model used as the backbone for the recommender decoder
NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items
MRR: Mean Reciprocal Rank—a statistical measure for evaluating any process that produces a list of possible responses to a sample of queries
Recall@K: The proportion of relevant items found in the top-K recommendations
InfoNCE: A contrastive loss function used to learn representations by maximizing agreement between positive pairs and minimizing it between negative pairs
KL divergence: Kullback-Leibler divergence—a measure of how one probability distribution is different from a second, reference probability distribution