LRM: Large Reasoning Model—LLMs specifically optimized for complex multi-step reasoning (e.g., math, logic) often via reinforcement learning or specialized data.
R2P: Reinforced Reasoning for Personalization—The authors' proposed framework to guide LRMs using structured templates and intervention.
LaMP: Language Model Personalization benchmark—A dataset collection for evaluating personalization capabilities across citation, news, movie, and product domains.
RAG: Retrieval-Augmented Generation—Enhancing model inputs with relevant external data (here, user history) to improve context awareness.
Divergent Thinking: The ability to explore multiple possible solutions or creative directions; contrasting with 'convergent thinking' which narrows down to one correct answer.
HRT: Hierarchical Reasoning Thought template—A structured prompt used in R2P to decompose personalization tasks into specific sub-steps.
RPI: Reasoning Process Intervention—A mechanism to monitor the model's output stream and inject corrective instructions if it deviates from the HRT.
SRM: Self-Referencing Module—A method where the model generates multiple candidate responses and then synthesizes them into a final answer to ensure consistency.