Slate Recommendation: Recommending an ordered sequence of items (e.g., a playlist or carousel) rather than a single item or unordered set
World Model: A model that simulates the environment (in this case, the user) to predict how it will respond to agent actions (recommendations)
Regret: The difference in utility between the item the user actually preferred and the item the model selected; a measure of 'how much worse' the model's choice was
Pairwise Reasoning: Evaluating items or slates by comparing them two at a time rather than assigning absolute scores to each
Transitivity: A logical axiom stating that if A is preferred to B, and B is preferred to C, then A must be preferred to C
Asymmetry: A logical axiom stating that if A is preferred to B, then B cannot be preferred to A
nDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that weights correct items higher if they appear earlier in the list
BPR: Bayesian Personalized Ranking—a pairwise optimization framework commonly used in recommender systems
Zero-shot: Using a pre-trained model to perform a task without any specific training examples for that task