Agentic Feedback Loop Modeling Improves Recommendation and User Simulation

📝 Paper Summary

LLM-based Recommendation User Simulation

AFL establishes an iterative communication loop where a recommendation agent and a user agent exchange feedback and rationales to simultaneously improve item suggestions and user behavior simulation.

Core Problem

Existing research optimizes recommendation agents or user agents in isolation, ignoring the reciprocal feedback loop (conversational adjustments and preference discovery) that characterizes real-world user-recommender interactions.

Why it matters:

Isolating agents misses the opportunity for the recommender to refine its understanding through user feedback
Single-turn user agents fail to simulate the dynamic process of users discovering their interests through interaction
Real-world feedback loops often amplify popularity and position biases, requiring robust modeling solutions

Concrete Example: In a standard setup, a user agent might simply reject an item. In AFL, the user agent explains *why* (e.g., 'I dislike horror'), stored in memory. The recommender agent then uses this history to adjust its next suggestion, while the user agent refines its own preference model based on the recommender's rationales.

Key Novelty

Agentic Feedback Loop (AFL)

Simulates a dialogue where the Recommender Agent provides items with reasons, and the User Agent provides feedback with reasons
Uses shared memory to store this interaction history, allowing both agents to iteratively update their reasoning and decisions within a single prediction session
Integrates a traditional recommendation model (as a tool for the Rec Agent) and a reward model (as a scorer for the User Agent) within an LLM-driven framework

Architecture

The framework of Agentic Feedback Loop (AFL) showing the interaction between the Recommendation Agent and User Agent.

Evaluation Highlights

+11.52% average improvement in recommendation performance compared to single recommendation agents
+21.12% average improvement in user simulation accuracy compared to single user agents
Demonstrates robustness by not exacerbating popularity or position bias, unlike real-world feedback loops

Breakthrough Assessment

7/10

Significantly improves performance by unifying two distinct tasks (recommendation and simulation) into a collaborative loop, addressing a logical gap in prior isolated agent approaches.

⚙️ Technical Details

Problem Definition

Setting: Sequential recommendation and user simulation based on interaction history

Inputs: User-item interaction history [I_1, ..., I_n]

Outputs: Next item prediction I_{n+1} (Rec task) or Like/Dislike decision (User Sim task)

Pipeline Flow

Recommendation Agent (Proposes Item + Rationale)
User Agent (Evaluates Item + Rationale → Feedback)
Memory Update (Stores interaction)
Loop repeats if feedback is negative; terminates if positive

System Modules

Recommendation Agent (Agents)

Suggests items and provides reasoning based on history and memory

Model or implementation: GPT-4o-mini

Recommendation Model (Tools)

Provides initial item candidates to the Rec Agent based on training data

Model or implementation: Interchangeable (e.g., standard sequential recommender)

User Agent (Agents)

Simulates user response (Like/Dislike) and provides reasons

Model or implementation: GPT-4o-mini

Reward Model (Tools)

Assigns a numerical relevance score to recommended items to guide the User Agent

Model or implementation: SASRec (Self-Attentive Sequential Recommendation)

Novel Architectural Elements

Iterative closed-loop architecture where outputs from Rec Agent become inputs for User Agent and vice-versa within a single inference session
Dual-memory system (M_r and M_u) maintaining distinct perspectives of the same conversation

Modeling

Base Model: GPT-4o-mini

Training Method: In-context learning with memory (no LLM fine-tuning reported)

Training Data:

Recommendation/Reward models (tools) are trained on LastFM, Steam, MovieLens datasets

Compute: Not reported in the paper

Comparison to Prior Work

vs. RecMind/MACRec: AFL incorporates a user agent loop to refine recommendations iteratively, rather than a single pass or internal collaboration
vs. Agent4Rec/RecLLM: AFL uses the user agent to reciprocally improve the recommender, not just for evaluation or data generation
vs. AgentCF [not cited in paper]: AgentCF simulates user-item interactions but AFL emphasizes the explicit textual feedback loop and rationale exchange

Limitations

Relies on the quality of the fixed Reward Model (SASRec) for user simulation grounding
Iterative API calls to GPT-4o-mini for every recommendation may have high latency and cost
Requires maintaining interaction history in context window, which may grow large

Reproducibility

Code: https://github.com/Lanyu0303/AFL

Code is publicly available at https://github.com/Lanyu0303/AFL. Prompt templates for both agents are provided in Tables 1 and 2. The Reward Model is specified as SASRec.

📊 Experiments & Results

Evaluation Setup

Sequential recommendation and user simulation evaluation

Benchmarks:

LastFM (Music recommendation)
Steam (Game recommendation)
MovieLens (Movie recommendation)

Metrics:

Recommendation Performance (Metrics not explicitly listed in snippet but implied standard RecSys metrics)
User Simulation Performance (Metrics not explicitly listed in snippet)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Average across datasets	Recommendation Improvement	0.0	11.52	+11.52
Average across datasets	User Simulation Improvement	0.0	21.12	+21.12

Main Takeaways

AFL yields significant improvements for both recommendation (+11.52%) and user simulation (+21.12%) tasks compared to single-agent baselines.
The iterative feedback loop enhances performance as the maximum number of iterations increases.
The approach is robust and does not exacerbate popularity or position bias, unlike real-world feedback loops which often amplify them.

📚 Prerequisite Knowledge

Prerequisites

Large Language Models (LLMs) and In-Context Learning
Sequential Recommendation
Agentic AI (Memory, Tools, Planning)

Key Terms

AFL: Agentic Feedback Loop—the proposed framework where recommender and user agents iteratively communicate to refine outputs

SASRec: Self-Attentive Sequential Recommendation—a specific deep learning model used here as a reward model to score user-item compatibility

Chain-of-Thought: A prompting technique where the model explains its reasoning step-by-step before giving a final answer

Role-Playing: Prompting an LLM to adopt a specific persona (e.g., 'You are a movie enthusiast') to guide its behavior

Reward Model: A fixed model (here, SASRec) that predicts a numerical score for an item, used by the User Agent to ground its simulation in data

Popularity Bias: The tendency of recommender systems to recommend frequently interacted items over less popular ones