University of Science and Technology of China,
Meta AI
arXiv
(2024)
RecommendationAgentMemoryP13N
📝 Paper Summary
LLM-based RecommendationUser Simulation
AFL establishes an iterative communication loop where a recommendation agent and a user agent exchange feedback and rationales to simultaneously improve item suggestions and user behavior simulation.
Core Problem
Existing research optimizes recommendation agents or user agents in isolation, ignoring the reciprocal feedback loop (conversational adjustments and preference discovery) that characterizes real-world user-recommender interactions.
Why it matters:
Isolating agents misses the opportunity for the recommender to refine its understanding through user feedback
Single-turn user agents fail to simulate the dynamic process of users discovering their interests through interaction
Real-world feedback loops often amplify popularity and position biases, requiring robust modeling solutions
Concrete Example:In a standard setup, a user agent might simply reject an item. In AFL, the user agent explains *why* (e.g., 'I dislike horror'), stored in memory. The recommender agent then uses this history to adjust its next suggestion, while the user agent refines its own preference model based on the recommender's rationales.
Key Novelty
Agentic Feedback Loop (AFL)
Simulates a dialogue where the Recommender Agent provides items with reasons, and the User Agent provides feedback with reasons
Uses shared memory to store this interaction history, allowing both agents to iteratively update their reasoning and decisions within a single prediction session
Integrates a traditional recommendation model (as a tool for the Rec Agent) and a reward model (as a scorer for the User Agent) within an LLM-driven framework
Architecture
The framework of Agentic Feedback Loop (AFL) showing the interaction between the Recommendation Agent and User Agent.
Evaluation Highlights
+11.52% average improvement in recommendation performance compared to single recommendation agents
+21.12% average improvement in user simulation accuracy compared to single user agents
Demonstrates robustness by not exacerbating popularity or position bias, unlike real-world feedback loops
Breakthrough Assessment
7/10
Significantly improves performance by unifying two distinct tasks (recommendation and simulation) into a collaborative loop, addressing a logical gap in prior isolated agent approaches.
⚙️ Technical Details
Problem Definition
Setting: Sequential recommendation and user simulation based on interaction history
Inputs: User-item interaction history [I_1, ..., I_n]
Outputs: Next item prediction I_{n+1} (Rec task) or Like/Dislike decision (User Sim task)
Pipeline Flow
Recommendation Agent (Proposes Item + Rationale)
User Agent (Evaluates Item + Rationale → Feedback)
Memory Update (Stores interaction)
Loop repeats if feedback is negative; terminates if positive
System Modules
Recommendation Agent (Agents)
Suggests items and provides reasoning based on history and memory
Model or implementation: GPT-4o-mini
Recommendation Model (Tools)
Provides initial item candidates to the Rec Agent based on training data
Model or implementation: Interchangeable (e.g., standard sequential recommender)
User Agent (Agents)
Simulates user response (Like/Dislike) and provides reasons
Model or implementation: GPT-4o-mini
Reward Model (Tools)
Assigns a numerical relevance score to recommended items to guide the User Agent
Model or implementation: SASRec (Self-Attentive Sequential Recommendation)
Novel Architectural Elements
Iterative closed-loop architecture where outputs from Rec Agent become inputs for User Agent and vice-versa within a single inference session
Dual-memory system (M_r and M_u) maintaining distinct perspectives of the same conversation
Modeling
Base Model: GPT-4o-mini
Training Method: In-context learning with memory (no LLM fine-tuning reported)
Training Data:
Recommendation/Reward models (tools) are trained on LastFM, Steam, MovieLens datasets
Compute: Not reported in the paper
Comparison to Prior Work
vs. RecMind/MACRec: AFL incorporates a user agent loop to refine recommendations iteratively, rather than a single pass or internal collaboration
vs. Agent4Rec/RecLLM: AFL uses the user agent to reciprocally improve the recommender, not just for evaluation or data generation
vs. AgentCF [not cited in paper]: AgentCF simulates user-item interactions but AFL emphasizes the explicit textual feedback loop and rationale exchange
Limitations
Relies on the quality of the fixed Reward Model (SASRec) for user simulation grounding
Iterative API calls to GPT-4o-mini for every recommendation may have high latency and cost
Requires maintaining interaction history in context window, which may grow large
Code is publicly available at https://github.com/Lanyu0303/AFL. Prompt templates for both agents are provided in Tables 1 and 2. The Reward Model is specified as SASRec.
📊 Experiments & Results
Evaluation Setup
Sequential recommendation and user simulation evaluation
Benchmarks:
LastFM (Music recommendation)
Steam (Game recommendation)
MovieLens (Movie recommendation)
Metrics:
Recommendation Performance (Metrics not explicitly listed in snippet but implied standard RecSys metrics)
User Simulation Performance (Metrics not explicitly listed in snippet)
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
Average across datasets
Recommendation Improvement
0.0
11.52
+11.52
Average across datasets
User Simulation Improvement
0.0
21.12
+21.12
Main Takeaways
AFL yields significant improvements for both recommendation (+11.52%) and user simulation (+21.12%) tasks compared to single-agent baselines.
The iterative feedback loop enhances performance as the maximum number of iterations increases.
The approach is robust and does not exacerbate popularity or position bias, unlike real-world feedback loops which often amplify them.
📚 Prerequisite Knowledge
Prerequisites
Large Language Models (LLMs) and In-Context Learning
Sequential Recommendation
Agentic AI (Memory, Tools, Planning)
Key Terms
AFL: Agentic Feedback Loop—the proposed framework where recommender and user agents iteratively communicate to refine outputs
SASRec: Self-Attentive Sequential Recommendation—a specific deep learning model used here as a reward model to score user-item compatibility
Chain-of-Thought: A prompting technique where the model explains its reasoning step-by-step before giving a final answer
Role-Playing: Prompting an LLM to adopt a specific persona (e.g., 'You are a movie enthusiast') to guide its behavior
Reward Model: A fixed model (here, SASRec) that predicts a numerical score for an item, used by the User Agent to ground its simulation in data
Popularity Bias: The tendency of recommender systems to recommend frequently interacted items over less popular ones