IRS: Interactive Recommender Systems—systems that adapt recommendations in real-time based on user feedback
Filter Bubble: A state of intellectual isolation where a user is exposed only to content that aligns with their existing preferences, excluding diverse viewpoints
PPO: Proximal Policy Optimization—a reinforcement learning algorithm used here to train the low-level policy learner
MDP: Markov Decision Process—a mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of a decision maker
Semantic Planning: High-level decision making focused on broad categories or topics rather than specific items
Reflection Pool: A memory bank storing textual critiques/summaries of past user sessions, used to prompt the LLM for better future planning
Gaussian distribution: A continuous probability distribution used here to sample virtual item embeddings for exploration
Transformer: A neural network architecture using self-attention, used here to encode user interaction history
Soft filter: A mechanism to prioritize items from selected categories without strictly forbidding others, or strictly enforcing the category mask over the item space