SBR: Session-based Recommendation—predicting the next user action based on a short sequence of recent interactions without long-term user profiles.
Self-reflection: A process where an LLM analyzes its own output and errors to generate feedback or corrections.
Hints: Short, specialized textual guidelines generated by the LLM during reflection to correct specific recommendation errors (e.g., 'Focus on the director style').
PPO: Proximal Policy Optimization—a reinforcement learning algorithm used here to train the retrieval agent to select the most helpful hints.
MDP: Markov Decision Process—a mathematical framework for modeling decision-making, defined by states, actions, rewards, and transitions.
NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items in the recommendation list.
HR: Hit Ratio—the percentage of times the correct item appears in the top-K recommendations.