TRM: Traditional Recommendation Model—classic deep learning models (like SASRec) trained on interaction data to predict user preferences
SASRec: Self-Attentive Sequential Recommendation—a specific TRM architecture that uses self-attention to model user interaction sequences
Preference-Aware TRM: A modified TRM proposed in this paper that combines interaction history embeddings with text embeddings of the LLM-generated user preference
Reinforce++: A reinforcement learning algorithm used here to optimize the LLM's policy for both interaction validity and recommendation accuracy
Point-wise reward: A reward signal evaluating the quality of individual recommended items based on textual and collaborative similarity to the ground truth
List-wise reward: A reward signal evaluating the overall quality of the generated item list, using metrics like Hit rate and Rank position
Cold-Start RL: The first training stage focused on teaching the LLM the correct format and interaction pattern for invoking the TRM tool
Deep Research: An AI agent concept referenced as inspiration, where LLMs autonomously interact with tools (like search engines) to solve complex tasks