Collaborative Signal Translation: A mechanism that retrieves behavioral neighbors (similar users/items) from a graph and converts these statistical patterns into natural language evidence
MARS: Multi-Agent Recommender System—the teacher framework used to generate reasoning-rich trajectories
STAR: Single-agent Trajectory-Aligned Recommender—the final efficient student model
GRPO: Group Relative Policy Optimization—an RL algorithm that optimizes a policy by comparing a group of outputs generated for the same input, estimating baselines from the group average rather than a separate value network
Trajectory Serialization: Converting complex, multi-turn agent communication logs into a linear text sequence with special tokens (e.g., <tool_call>) for training
Outcome-based Filtering: A data cleaning step where only teacher trajectories that result in the correct ground-truth prediction are kept for training the student
SFT: Supervised Fine-Tuning—training the model to mimic the teacher's exact tokens