A Framework for Generating Conversational Recommendation Datasets from Behavioral Interactions

📝 Paper Summary

Synthetic Data Generation Conversational Recommendation Systems (CRS)

ConvRecStudio generates realistic multi-turn conversational recommendation datasets by using LLM agents grounded in real historical user-item interactions and temporal aspect profiles.

Core Problem

Combining collaborative filtering (history-based) and conversational recommendation (dialog-based) is hindered by the lack of datasets containing both long-term interaction logs and corresponding natural language dialogs.

Why it matters:

Current CRS models ignore long-term user history, leading to generic suggestions
Traditional recommenders cannot interactively elicit immediate needs
Manual collection of grounded conversational data is prohibitively expensive and requires domain expertise

Concrete Example: A user typically buys tech gadgets (long-term preference) but explicitly asks for a 'budget-friendly speaker for a party' (immediate need). Existing datasets have either the purchase log OR the chat, but not both linked together, preventing models from learning to fuse these signals.

Key Novelty

ConvRecStudio Framework

Constructs temporal user/item profiles from reviews to capture evolving preferences over fine-grained aspects (e.g., battery life) without manual annotation
Uses a Semantic Dialog Plan (a DAG of dialog acts) to structure the conversation flow while allowing LLMs flexibility in phrasing
Simulates dialogs using two role-playing LLM agents (User and System) constrained by the plan and grounded in real timestamped interactions

Architecture

Overview of the ConvRecStudio framework pipeline.

Evaluation Highlights

Generated over 38,000 multi-turn dialogs across three domains (MobileRec, Yelp, Amazon Electronics) grounded in real user behavior
A proposed cross-attention model trained on this data achieves a 10.9% improvement in Hit@1 on the Yelp dataset compared to the strongest baseline
Human evaluation confirms generated dialogs are fluent, coherent, and faithfully reflect the underlying user-item interactions

Breakthrough Assessment

8/10

Addresses a critical data scarcity bottleneck in conversational recommendation. The generated datasets enable a new class of models that fuse collaborative and conversational signals.

⚙️ Technical Details

Problem Definition

Setting: Synthetic data generation from behavioral logs

Inputs: Historical user-item interaction dataset D containing users, items, reviews, and timestamps

Outputs: Multi-turn conversational dataset C where each dialog is grounded in a specific user-item interaction pair

Pipeline Flow

Temporal Profiling (User/Item Modeling)
Semantic Dialog Planning
Multi-Turn Simulation (LLM Agents)

System Modules

Temporal Profiling

Construct evolving user profiles and global item profiles from reviews and metadata

Model or implementation: Unsupervised aspect induction and sentiment estimation

Semantic Dialog Planning

Generate a structured plan for the conversation to ensure logical flow

Model or implementation: DAG-based planner

User Agent (Simulation)

Simulate the user in the conversation

Model or implementation: Large Language Model (User LLM)

System Agent (Simulation)

Simulate the recommender system

Model or implementation: Large Language Model (System LLM)

Novel Architectural Elements

Plan-constrained generation: Using a semantic DAG to control LLM agents ensures coherence and prevents hallucination/wandering common in free-form generation
Dual-agent simulation with asymmetric information: User agent knows the target; System agent only knows history and profiles, mirroring real scenarios

Reproducibility

The paper states the framework, datasets, and model will be released, but no URL is provided in the text. Evaluation uses public datasets (MobileRec, Yelp, Amazon Electronics). Specific LLM versions and prompt templates are not detailed in the snippet.

📊 Experiments & Results

Evaluation Setup

Dataset generation followed by downstream recommendation task evaluation

Benchmarks:

MobileRec (Mobile App Recommendation)
Yelp (Local Business Recommendation)
Amazon Electronics (Consumer Electronics Recommendation)

Metrics:

Hit@K
NDCG@K
Human Evaluation (Naturalness, Coherence, Groundedness)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

ConvRecStudio successfully generated over 38,000 dialogs across three diverse domains (MobileRec, Yelp, Amazon), demonstrating scalability.
A downstream cross-attention transformer model trained on this synthetic data consistently outperformed baselines (History-only, Dialog-only, Naive Fusion) across all datasets.
The proposed unified model achieved a 10.9% improvement in Hit@1 on Yelp, highlighting the value of fusing collaborative history with conversational context.
Human evaluators rated the synthetic dialogs as natural, coherent, and grounded, validating the effectiveness of the profile-driven, plan-constrained generation pipeline.

📚 Prerequisite Knowledge

Prerequisites

Conversational Recommendation Systems (CRS)
Collaborative Filtering
Large Language Models (LLMs)
Aspect-Based Sentiment Analysis

Key Terms

CRS: Conversational Recommendation Systems—systems that interact with users via natural language to elicit preferences and make suggestions

Collaborative Filtering: A recommendation technique that predicts user preferences based on the historical behavior of similar users

DAG: Directed Acyclic Graph—a structure used here to plan the flow of dialog acts (e.g., greeting → preference elicitation) without loops

Hit@K: A metric measuring the proportion of times the correct item appears in the top K recommendations

NDCG: Normalized Discounted Cumulative Gain—a ranking metric that gives higher credit for correct items appearing earlier in the list

Dialog Act: The function or intent of a specific utterance in a conversation (e.g., asking for clarification, making a recommendation)

Temporal Profiling: Creating user profiles that track how preferences for specific item aspects (like price or quality) change over time