← Back to Paper List

Arag: Agentic retrieval augmented generation for personalized recommendation

Reza Yousefi Maragheh, Pratheek Vadla, Priyank Gupta, Kai Zhao, Aysenur Inan, Kehui Yao, Jianpeng Xu, Praveen Kanumala, Jason Cho, Sushant Kumar
Walmart Global Tech
arXiv preprint arXiv … (2025)
RAG Agent Memory P13N Recommendation

📝 Paper Summary

Agentic RAG pipeline LLM-based recommendation Memory recall
ARAG is a multi-agent framework that refines standard retrieval by employing specialized agents to summarize long-term user context, verify item relevance via natural language inference, and re-rank candidates based on synthesized intent.
Core Problem
Standard RAG in recommendation systems relies on simplistic retrieval mechanisms (like cosine similarity) that often fail to capture nuanced user preferences or dynamic session contexts.
Why it matters:
  • Static embedding matching struggles to comprehend implicit interests embedded in long-form user documents and reviews
  • Existing methods often prioritize simple recency or surface-level text matching over deep semantic alignment with user intent
  • Failure to accurately model complex user contexts leads to irrelevant suggestions, reducing user trust and engagement in recommendation platforms
Concrete Example: A standard RAG might retrieve a 'Dasein Hobo Handbag' simply because it is a bag, whereas ARAG, knowing the user specifically prefers 'vegan leather' and 'checkered styles' from their history, would prioritize the 'BUTIED Checkered Tote' instead.
Key Novelty
Multi-Agent Collaboration for Personalized Ranking
  • Decomposes the recommendation task into specialized sub-tasks: understanding the user, checking item entitlement (NLI), summarizing context, and final ranking
  • Uses a blackboard-style shared memory where agents write rationales (e.g., 'supports/contradicts' judgments), allowing the final ranker to reason over logic rather than just raw data
Architecture
Architecture Figure Figure 1
The multi-agent workflow of ARAG for personalized recommendation.
Evaluation Highlights
  • +42.1% improvement in NDCG@5 on the Amazon Clothing dataset compared to the best baseline (Recency-based Ranking)
  • +35.5% improvement in Hit@5 on the Amazon Clothing dataset compared to the best baseline
  • Consistent gains across diverse domains (Clothing, Electronics, Home), outperforming both Vanilla RAG and Recency heuristics
Breakthrough Assessment
7/10
Significant quantitative improvements (over 40%) in specific domains demonstrate the efficacy of agentic workflows over static RAG for recommendation, though the core components (NLI, summarization) are established techniques applied in a new pipeline.
×