← Back to Paper List

RecoWorld: Building Simulated Environments for Agentic Recommender Systems

Fei Liu, Xinyu Lin, Hanchao Yu, Mingyuan Wu, Jianyu Wang, Qiang Zhang, Zhuokai Zhao, Yinglong Xia, Yao Zhang, Weiwei Li, Mingze Gao, Qifan Wang, Lizhu Zhang, Benyu Zhang, Xiangjun Fan
Meta Platforms, Inc., National University of Singapore, University of Illinois
arXiv (2025)
Recommendation Agent Memory RL MM P13N

📝 Paper Summary

Recommender System Simulation Agentic AI Evaluation
RecoWorld is a blueprint for a simulated environment where LLM-based users provide natural language instructions to recommender agents, enabling the training of interactive systems without risking real user retention.
Core Problem
Traditional offline evaluation metrics (like Recall) suffer from exposure bias, while online A/B tests are slow and risky for testing radically new agentic strategies.
Why it matters:
  • Existing offline metrics reinforce known patterns rather than discovering new user interests (exposure bias).
  • Agentic recommenders need to learn to follow instructions and plan over long horizons, capabilities that static datasets cannot evaluate.
  • Testing unproven agentic behaviors on real users risks degrading the user experience and causing churn.
Concrete Example: In a standard system, a bored user simply leaves. In RecoWorld, a simulated user about to churn issues an explicit instruction like 'show me more interesting content,' challenging the recommender to interpret this feedback and immediately adjust the list to retain the user.
Key Novelty
Dual-View Agentic Simulation Environment
  • Models the user not just as a click-generator but as an agent that reflects on dissatisfaction and issues natural language instructions (e.g., 'stop showing me sports') to the recommender.
  • Establishes a multi-turn feedback loop where the Recommender Agent must 'follow instructions' to maximize a long-term reward signal (session retention) rather than immediate clicks.
Evaluation Highlights
  • The paper presents a blueprint and architecture rather than empirical benchmark results.
  • Proposed evaluation compares simulated session trajectories against human annotator trajectories to validate realism.
Breakthrough Assessment
5/10
This is a position/blueprint paper proposing a novel environment design. While the concept of 'instruction-following simulation' is significant, the paper explicitly states it does not present experimental results.
×