← Back to Paper List

RecGPT-V2 Technical Report

(Alibaba) RecGPT team
arXiv, 12/2025 (2025)
Recommendation Agent RL P13N

📝 Paper Summary

LLM-based Recommendation Agentic Recommender Systems
RecGPT-V2 restructures recommender systems into a coordinated multi-agent framework that compresses user behavior and employs specialized agents to reason about intent, reducing computation while improving personalization.
Core Problem
Previous LLM-based recommenders like RecGPT-V1 suffer from computational inefficiency due to redundant full-sequence processing, generate homogeneous explanations via fixed templates, and lack stability when optimizing multiple conflicting objectives.
Why it matters:
  • Industrial systems require massive scale; repeatedly encoding 32K token user histories for every reasoning route is prohibitively expensive
  • Users disengage when presented with generic, repetitive explanations that fail to account for real-time context like weather or trends
  • Simple supervised learning fails to balance competing goals (accuracy vs. diversity) in dynamic environments, leading to suboptimal recommendations
Concrete Example: In RecGPT-V1, multiple reasoning routes (e.g., weather route, trend route) would each independently process the same 32K-token user history, creating 13.46% redundant candidate overlap. Additionally, explanations were generated using rigid templates, failing to adapt tone or content to specific user contexts.
Key Novelty
Hierarchical Multi-Agent System (HMAS) with Hybrid Representation Inference
  • Replaces isolated reasoning pipelines with a collaborative team: a Planner decomposes intent, specialized Experts generate tags, and an Arbiter refines the final list
  • Compresses lengthy user behavior sequences into 'atomic units' (single vectors) using a trained adaptor, drastically shortening input length while preserving semantic meaning
  • Uses 'Constrained Reward Shaping' in reinforcement learning to satisfy secondary goals (like diversity) before optimizing primary accuracy, preventing objective conflicts
Architecture
Architecture Figure Figure 2
The overall RecGPT-V2 architecture pipeline from user behavior input to final recommendation output.
Evaluation Highlights
  • +3.64% Item Page Views (IPV) and +3.01% Click-Through Rate (CTR) in online A/B tests on Taobao
  • Reduces GPU consumption by 60.0% and improves Model FLOPs Utilization (MFU) by +53.7% compared to RecGPT-V1
  • Achieves +24.0% improvement in human-evaluated tag quality pass rate using Constrained Reward Shaping
Breakthrough Assessment
9/10
Solving the token-cost bottleneck of LLM recommenders via atomic compression while simultaneously deploying a complex multi-agent architecture at industrial scale (Taobao) is a major engineering and algorithmic milestone.
×