← Back to Paper List

Retrieval Augmented Conversational Recommendation with Reinforcement Learning

Yue, Zhenrui, Zhuang, Honglei, Qin, Zhen, He, Zhankui, Zeng, Huimin, McAuley, Julian, Wang, Dong
University of Illinois Urbana-Champaign, Google DeepMind
arXiv, 4/2026 (2026)
Recommendation RAG RL Benchmark P13N

📝 Paper Summary

Conversational Recommender Systems (CRS) Retrieval-Augmented Generation (RAG)
RAR is a two-stage conversational recommendation framework that aligns an embedding-based retriever with a black-box LLM generator using reinforcement learning driven by LLM ranking feedback.
Core Problem
Existing LLM-based conversational recommender systems suffer from retrieval-generation misalignment and struggle to recommend novel or cold-start items due to the lack of external retrieval mechanisms and unified metadata corpora.
Why it matters:
  • LLMs rely on static pre-trained knowledge, making them unaware of novel items unless expensively retrained.
  • When a naive retriever returns sub-optimal or irrelevant candidates, the LLM generator often amplifies these deficiencies, deteriorating recommendation accuracy.
  • Scaling retrieval using knowledge graphs requires intensive data preprocessing and graph indexing overhead.
Concrete Example: When a user asks for a recently released movie, a standalone LLM might hallucinate or fail to recommend it due to knowledge cutoffs, while a poorly aligned retriever might fetch irrelevant classic movies that the LLM then erroneously recommends.
Key Novelty
Retrieval Augmented Conversational Recommendation (RAR)
  • Separates the system into a lightweight retriever and a powerful black-box LLM generator to allow dynamic updates with novel items without retraining the LLM.
  • Uses the LLM's own outputs to evaluate the retriever's suggestions, updating the retriever via reinforcement learning to fetch items the LLM actually prefers.
Architecture
Architecture Figure Figure 1 / Figure 2 (Conceptual)
The two-stage retrieval augmented conversational recommendation workflow and the iterative RL feedback loop.
Breakthrough Assessment
7/10
Introduces a practical RL-based alignment loop for two-stage CRS and provides a valuable large-scale metadata corpus, though empirical results are omitted in the provided text.
×