DGDPO refines user simulator profiles by iteratively diagnosing defects using a specialized model and correcting them with a generalized model, enabling realistic multi-round evolution with sequential recommenders.
Core Problem
Existing LLM-based user simulators rely on static, single-step profiles that cannot correct initial inaccuracies or adapt to evolving interests, and fail to simulate realistic multi-round feedback loops.
Why it matters:
Static profiles cause simulated behavior to progressively diverge from real user actions as errors persist uncorrected
Current simulators mostly use single-round interactions with static recommenders, failing to capture how real users and systems mutually adapt over time
General-purpose LLMs hallucinate when asked to self-diagnose profile defects, leading to unreliable updates
Concrete Example:If an initial profile incorrectly states a user 'dislikes comedy' (Inaccurate), the simulator will consistently reject comedy recommendations. A standard LLM might fail to identify this contradiction from interaction history, whereas DGDPO diagnoses the specific defect and updates the profile.
Decouples the optimization into a 'Diagnostic' phase (identifying specific defects like inaccuracy or incompleteness) and a 'Treatment' phase (generating fixes)
Uses a 'Specialized' small LLM for reliable diagnosis (trained on synthetic defects) and a 'Generalized' large LLM for complex reasoning and profile rewriting
Integrates the simulator with Sequential Recommenders (SRs) to enable a bidirectional evolution where both the user profile and the recommender strategy update based on interaction history
Architecture
The DGDPO framework workflow involving the Diagnostic Module, Treatment Module, and interaction with Sequential Recommenders.
Evaluation Highlights
Specialized diagnostic module achieves 92.20% average accuracy on profile defect identification
General-purpose LLMs (without specialized training) achieve only 62.78% accuracy on the same defect identification task
Demonstrates effective identification of 'Inaccurate', 'Incomplete', and combined profile defects compared to baselines
Breakthrough Assessment
7/10
Addresses a critical bottleneck in user simulation (static/hallucinated profiles) with a logical diagnostic-treatment split. The integration with Sequential Recommenders for bidirectional evolution is a significant step toward realistic simulation.
⚙️ Technical Details
Problem Definition
Setting: User simulation in Sequential Recommendation where a simulator interacts with items to mimic real user behavior
Inputs: User's historical interaction sequence split into initial history (D_ini) and optimization history (D_opt)
Outputs: Optimized User Profile (P_opt) that accurately reflects user preferences
Pipeline Flow
Data Splitting: History split into Initialization and Optimization sets
Discrepancy Detection: Identify mismatches between simulator and real user decisions
Diagnostic Module: Specialized LLM predicts defect type (Inaccurate/Incomplete)
Treatment Module: Generalized LLM generates explanation and modification to update profile
Multi-round Interaction: Updated simulator interacts with Sequential Recommender
System Modules
Diagnostic Module (Profile Optimization)
Identifies the specific type of defect in the user profile based on a discrepancy case
Model or implementation: Specialized LLM (Small-scale general LLM fine-tuned)
Treatment Module (Profile Optimization)
Analyzes the diagnosed defect and generates a refined user profile
Model or implementation: Generalized LLM (Large-scale, e.g., GPT-4)
Sequential Recommender (SR)
Provides recommendations to the simulator and updates its own state based on feedback
Model or implementation: Sequential Recommendation Model (e.g., Transformer-based)
Novel Architectural Elements
Decoupled Diagnostic-Treatment architecture where a specialized small model guides a generalized large model
Integration of dynamic profile optimization within a sequential recommendation loop (bidirectional evolution)
Modeling
Base Model: Generalized LLM for treatment (e.g., GPT series); Specialized small LLM for diagnosis
Training Method: Supervised Fine-Tuning (SFT) on synthetic defect data
Objective Functions:
Purpose: Pre-training to learn domain semantics.
Formally: Next-token prediction on valid simulation traces L_Pre(theta) = sum log P(x_t | x_<t)
Purpose: Fine-tuning for defect diagnosis.
Formally: Next-token prediction loss calculated ONLY on output tokens (masked input) L_FT(theta) = - sum_{t in Output} log P(y_t | y_<t, Input)
Training Data:
Pre-training: Corpus of cases where simulator decisions matched real user high-rated interactions
Fine-tuning: Synthetic profiles created by injecting defects (flipping sentiment for 'Inaccurate', deleting text for 'Incomplete') into valid profiles
Key Hyperparameters:
rating_threshold: >= 3 (out of 5) for valid interactions
Compute: Not reported in the paper
Comparison to Prior Work
vs. Agent4Rec: DGDPO updates profiles dynamically step-by-step rather than using a fixed initial profile
vs. Self-reflection methods: DGDPO uses a specialized trained module for diagnosis to reduce hallucinations, achieving 92% accuracy vs 62% for general LLMs
Limitations
Relies on 'Discrepancy Cases' (mismatch with ground truth) to trigger updates, which assumes historical data is the absolute ground truth
Requires training a specialized diagnostic module, adding complexity compared to zero-shot prompting
Evaluation of sequential interaction depends on the quality of the underlying Sequential Recommender
Reproducibility
The paper describes the synthetic data generation pipeline for the diagnostic module in detail. Code availability is not provided in the text. Specific model sizes (parameter counts) for the 'small-scale' diagnostic LLM are not specified in the provided text.
📊 Experiments & Results
Evaluation Setup
Profile defect identification accuracy analysis and (implied) recommendation performance simulation
Benchmarks:
Three real-world datasets (Sequential Recommendation / User Simulation)
Metrics:
Defect Identification Accuracy
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
Profile Defect Identification
Accuracy
62.78
92.20
+29.42
Main Takeaways
General-purpose LLMs struggle with specific diagnostic tasks (62.78% accuracy), justifying the need for a specialized diagnostic module.
The Diagnostic-Treatment framework allows for targeted profile refinement (addressing Inaccurate vs Incomplete specifically) rather than generic regeneration.
Synthetic data generation (flipping sentiments, deleting details) is effective for training the diagnostic module.
📚 Prerequisite Knowledge
Prerequisites
Large Language Models (LLMs) in Recommender Systems
User Simulation concepts (Profile, Memory, Action)
Sequential Recommendation
Key Terms
DGDPO: Diagnostic-Guided Dynamic Profile Optimization—the proposed framework for iteratively refining user profiles
Sequential Recommenders (SRs): Recommender systems that model user interests as evolving sequences rather than static preferences
Discrepancy Case: An instance where the simulated user's behavior (e.g., rejecting an item) contradicts the real user's historical behavior (e.g., clicking the item)
Domain-Adaptive Pre-training: Training the diagnostic model on valid simulator interaction traces to learn domain logic before fine-tuning
Defect-Specific Fine-tuning: Fine-tuning the diagnostic model on synthetic data containing artificially injected profile errors (inaccurate/incomplete)
Inaccurate Defect: Profile contains descriptions contradicting real behavior (e.g., 'dislikes sci-fi' when user watches sci-fi)
Incomplete Defect: Profile lacks necessary descriptions to explain a user's interaction (e.g., missing 'likes sci-fi')