← Back to Paper List

ChainRec: An Agentic Recommender Learning to Route Tool Chains for Diverse and Evolving Interests

Fuchun Li, Qian Li, Xingyu Gao, Bocheng Pan, Yang Wu, Jun Zhang, Huan Yu, Jie Jiang, Jinsheng Xiao, Hailong Shi
Chinese Academy of Sciences, Tencent, Wuhan University
arXiv (2026)
Recommendation Agent RL Memory Reasoning

📝 Paper Summary

Agentic Recommender Systems LLM-based Recommendation Tool-augmented Agents
ChainRec is an agentic recommender that learns to dynamically route reasoning tools at inference time—deciding what evidence to gather and when to stop—using a standardized tool library and preference-optimized planning.
Core Problem
Most agentic recommenders rely on fixed workflows or scripts that apply the same reasoning procedure across all scenarios, making them brittle when user contexts vary widely (e.g., cold-start vs. interest shifts).
Why it matters:
  • Fixed strategies fail to adapt: cold-start users need different evidence (e.g., demographics) than established users (e.g., long-term history), wasting compute on irrelevant steps.
  • Static pipelines cannot actively seek missing information when signals are sparse or noisy, leading to poorly grounded rankings.
  • Current LLM recommenders often assume near-complete context is provided upfront, whereas real-world agents must actively decide what to retrieve.
Concrete Example: In a cold-start scenario, a fixed-script agent might waste steps analyzing non-existent history. In contrast, ChainRec detects the sparse history and dynamically routes to demographic profiling tools. Conversely, during an interest shift, it pivots to gather immediate interaction evidence rather than relying on long-term preferences.
Key Novelty
Observe–Decide–Act loop with State-Aware Tool Routing
  • Decouples capability from planning: constructs a standardized 'Tool Agent Library' (TAL) by mining expert reasoning chains for reusable patterns (e.g., 'GetReviews', 'SummarizeHistory').
  • Replaces static scripts with a learned Planner that dynamically selects the next tool based on the current accumulated evidence state.
  • Optimizes the planner using a two-stage recipe (SFT → DPO) to prefer efficient, high-utility tool chains over suboptimal ones.
Evaluation Highlights
  • Consistently improves Avg HR@{1,3,5} over strong baselines (including ReAct and fixed-chain agents) on AgentRecBench across Amazon, Yelp, and Goodreads datasets.
  • Achieves notable gains in 'cold-start' and 'evolving-interest' scenarios where dynamic adaptation is critical.
  • Ablation studies confirm that both the standardized tool library and the preference-optimized (DPO) planning contribute significantly to performance.
Breakthrough Assessment
8/10
Strong conceptual advance by moving from static reasoning chains to dynamic, learned routing in recommendation. The decoupling of tool standardization and policy optimization addresses a key rigidity in current agentic systems.
×