Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation

📝 Paper Summary

User Simulation Agentic Recommender Systems

DGDPO refines user simulator profiles by iteratively diagnosing defects using a specialized model and correcting them with a generalized model, enabling realistic multi-round evolution with sequential recommenders.

Core Problem

Existing LLM-based user simulators rely on static, single-step profiles that cannot correct initial inaccuracies or adapt to evolving interests, and fail to simulate realistic multi-round feedback loops.

Why it matters:

Static profiles cause simulated behavior to progressively diverge from real user actions as errors persist uncorrected
Current simulators mostly use single-round interactions with static recommenders, failing to capture how real users and systems mutually adapt over time
General-purpose LLMs hallucinate when asked to self-diagnose profile defects, leading to unreliable updates

Concrete Example: If an initial profile incorrectly states a user 'dislikes comedy' (Inaccurate), the simulator will consistently reject comedy recommendations. A standard LLM might fail to identify this contradiction from interaction history, whereas DGDPO diagnoses the specific defect and updates the profile.

Key Novelty

Diagnostic-Guided Dynamic Profile Optimization (DGDPO)

Decouples the optimization into a 'Diagnostic' phase (identifying specific defects like inaccuracy or incompleteness) and a 'Treatment' phase (generating fixes)
Uses a 'Specialized' small LLM for reliable diagnosis (trained on synthetic defects) and a 'Generalized' large LLM for complex reasoning and profile rewriting
Integrates the simulator with Sequential Recommenders (SRs) to enable a bidirectional evolution where both the user profile and the recommender strategy update based on interaction history

Architecture

The DGDPO framework workflow involving the Diagnostic Module, Treatment Module, and interaction with Sequential Recommenders.

Evaluation Highlights

Specialized diagnostic module achieves 92.20% average accuracy on profile defect identification
General-purpose LLMs (without specialized training) achieve only 62.78% accuracy on the same defect identification task
Demonstrates effective identification of 'Inaccurate', 'Incomplete', and combined profile defects compared to baselines

Breakthrough Assessment

7/10

Addresses a critical bottleneck in user simulation (static/hallucinated profiles) with a logical diagnostic-treatment split. The integration with Sequential Recommenders for bidirectional evolution is a significant step toward realistic simulation.

⚙️ Technical Details

Problem Definition

Setting: User simulation in Sequential Recommendation where a simulator interacts with items to mimic real user behavior

Inputs: User's historical interaction sequence split into initial history (D_ini) and optimization history (D_opt)

Outputs: Optimized User Profile (P_opt) that accurately reflects user preferences

Pipeline Flow

Data Splitting: History split into Initialization and Optimization sets
Initial Profile Generation: Static prompt-based inference
Discrepancy Detection: Identify mismatches between simulator and real user decisions
Diagnostic Module: Specialized LLM predicts defect type (Inaccurate/Incomplete)
Treatment Module: Generalized LLM generates explanation and modification to update profile
Multi-round Interaction: Updated simulator interacts with Sequential Recommender

System Modules

Diagnostic Module (Profile Optimization)

Identifies the specific type of defect in the user profile based on a discrepancy case

Model or implementation: Specialized LLM (Small-scale general LLM fine-tuned)

Treatment Module (Profile Optimization)

Analyzes the diagnosed defect and generates a refined user profile

Model or implementation: Generalized LLM (Large-scale, e.g., GPT-4)

Sequential Recommender (SR)

Provides recommendations to the simulator and updates its own state based on feedback

Model or implementation: Sequential Recommendation Model (e.g., Transformer-based)

Novel Architectural Elements

Decoupled Diagnostic-Treatment architecture where a specialized small model guides a generalized large model
Integration of dynamic profile optimization within a sequential recommendation loop (bidirectional evolution)

Modeling

Base Model: Generalized LLM for treatment (e.g., GPT series); Specialized small LLM for diagnosis

Training Method: Supervised Fine-Tuning (SFT) on synthetic defect data

Objective Functions:

Purpose: Pre-training to learn domain semantics.

Formally: Next-token prediction on valid simulation traces L_Pre(theta) = sum log P(x_t | x_<t)
Purpose: Fine-tuning for defect diagnosis.

Formally: Next-token prediction loss calculated ONLY on output tokens (masked input) L_FT(theta) = - sum_{t in Output} log P(y_t | y_<t, Input)

Training Data:

Pre-training: Corpus of cases where simulator decisions matched real user high-rated interactions
Fine-tuning: Synthetic profiles created by injecting defects (flipping sentiment for 'Inaccurate', deleting text for 'Incomplete') into valid profiles

Key Hyperparameters:

rating_threshold: >= 3 (out of 5) for valid interactions

Compute: Not reported in the paper

Comparison to Prior Work

vs. Agent4Rec: DGDPO updates profiles dynamically step-by-step rather than using a fixed initial profile
vs. Self-reflection methods: DGDPO uses a specialized trained module for diagnosis to reduce hallucinations, achieving 92% accuracy vs 62% for general LLMs

Limitations

Relies on 'Discrepancy Cases' (mismatch with ground truth) to trigger updates, which assumes historical data is the absolute ground truth
Requires training a specialized diagnostic module, adding complexity compared to zero-shot prompting
Evaluation of sequential interaction depends on the quality of the underlying Sequential Recommender

Reproducibility

The paper describes the synthetic data generation pipeline for the diagnostic module in detail. Code availability is not provided in the text. Specific model sizes (parameter counts) for the 'small-scale' diagnostic LLM are not specified in the provided text.

📊 Experiments & Results

Evaluation Setup

Profile defect identification accuracy analysis and (implied) recommendation performance simulation

Benchmarks:

Three real-world datasets (Sequential Recommendation / User Simulation)

Metrics:

Defect Identification Accuracy
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Profile Defect Identification	Accuracy	62.78	92.20	+29.42

Main Takeaways

General-purpose LLMs struggle with specific diagnostic tasks (62.78% accuracy), justifying the need for a specialized diagnostic module.
The Diagnostic-Treatment framework allows for targeted profile refinement (addressing Inaccurate vs Incomplete specifically) rather than generic regeneration.
Synthetic data generation (flipping sentiments, deleting details) is effective for training the diagnostic module.

📚 Prerequisite Knowledge

Prerequisites

Large Language Models (LLMs) in Recommender Systems
User Simulation concepts (Profile, Memory, Action)
Sequential Recommendation

Key Terms

DGDPO: Diagnostic-Guided Dynamic Profile Optimization—the proposed framework for iteratively refining user profiles

Sequential Recommenders (SRs): Recommender systems that model user interests as evolving sequences rather than static preferences

Discrepancy Case: An instance where the simulated user's behavior (e.g., rejecting an item) contradicts the real user's historical behavior (e.g., clicking the item)

Domain-Adaptive Pre-training: Training the diagnostic model on valid simulator interaction traces to learn domain logic before fine-tuning

Defect-Specific Fine-tuning: Fine-tuning the diagnostic model on synthetic data containing artificially injected profile errors (inaccurate/incomplete)

Inaccurate Defect: Profile contains descriptions contradicting real behavior (e.g., 'dislikes sci-fi' when user watches sci-fi)

Incomplete Defect: Profile lacks necessary descriptions to explain a user's interaction (e.g., missing 'likes sci-fi')