← Back to Paper List

DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation

Yu Wang, Zhiwei Liu, Jianguo Zhang, Weiran Yao, Shelby Heinecke, Philip S. Yu
University of Illinois Chicago, Salesforce AI Research
arXiv (2023)
Recommendation Reasoning P13N

📝 Paper Summary

Sequential Recommendation Prompt Engineering
DRDT improves sequential recommendation without fine-tuning by combining a collaborative example retriever with a 'probe-critique-reflect' prompting loop that simulates human learning to handle noise and evolving preferences.
Core Problem
Existing LLM prompting strategies (ICL, CoT) fail to capture collaborative signals across datasets, struggle with noisy sequences, and cannot adequately track the temporal evolution of user preferences.
Why it matters:
  • Standard prompts rely only on the current sequence, missing the 'collaborative' view (patterns from similar users) crucial for recommendation.
  • Convergent thinking (standard CoT) often hallucinates reasoning paths based solely on similarity, ignoring diverse user motives (price vs. quality).
  • Noise in user history can mislead the LLM if not explicitly identified and critiqued, leading to error accumulation.
Concrete Example: A user's history might contain noisy interactions (random clicks) mixed with genuine preference signals. A standard Chain-of-Thought prompt might force a similarity-based justification for the noisy item, leading to a hallucinated preference. DRDT uses 'Divergent Thinking' to analyze multiple aspects (price, color, reviews) and 'Dynamic Reflection' to critique the prediction, identifying the noise rather than blindly following it.
Key Novelty
Dynamic Reflection with Divergent Thinking (DRDT) in a Retriever-Reranker Framework
  • **Collaborative In-Context Demonstration Retriever:** Instead of random examples, it retrieves sequences from *other* users that end with the same item as the target user's recent history, explicitly injecting collaborative signals.
  • **Divergent Thinking:** Shifts from finding a single reasoning path (convergent) to analyzing interactions from multiple dimensions (price, quality, etc.) to capture personalized motives.
  • **Dynamic Reflection:** A temporal reasoning loop where the LLM 'probes' a next item, 'critiques' its own prediction/analysis, and 'reflects' to adjust its understanding step-by-step, mimicking human learning.
Breakthrough Assessment
7/10
Addresses critical gaps in LLM recommendation (collaborative signals and temporal evolution) with a logically sound prompting framework. Achieves strong performance (beating GPT-3.5 with 7B models) without fine-tuning, though it relies on inference-time complexity.
×