DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation

📝 Paper Summary

Sequential Recommendation Prompt Engineering

DRDT improves sequential recommendation without fine-tuning by combining a collaborative example retriever with a 'probe-critique-reflect' prompting loop that simulates human learning to handle noise and evolving preferences.

Core Problem

Existing LLM prompting strategies (ICL, CoT) fail to capture collaborative signals across datasets, struggle with noisy sequences, and cannot adequately track the temporal evolution of user preferences.

Why it matters:

Standard prompts rely only on the current sequence, missing the 'collaborative' view (patterns from similar users) crucial for recommendation.
Convergent thinking (standard CoT) often hallucinates reasoning paths based solely on similarity, ignoring diverse user motives (price vs. quality).
Noise in user history can mislead the LLM if not explicitly identified and critiqued, leading to error accumulation.

Concrete Example: A user's history might contain noisy interactions (random clicks) mixed with genuine preference signals. A standard Chain-of-Thought prompt might force a similarity-based justification for the noisy item, leading to a hallucinated preference. DRDT uses 'Divergent Thinking' to analyze multiple aspects (price, color, reviews) and 'Dynamic Reflection' to critique the prediction, identifying the noise rather than blindly following it.

Key Novelty

Dynamic Reflection with Divergent Thinking (DRDT) in a Retriever-Reranker Framework

**Collaborative In-Context Demonstration Retriever:** Instead of random examples, it retrieves sequences from *other* users that end with the same item as the target user's recent history, explicitly injecting collaborative signals.
**Divergent Thinking:** Shifts from finding a single reasoning path (convergent) to analyzing interactions from multiple dimensions (price, quality, etc.) to capture personalized motives.
**Dynamic Reflection:** A temporal reasoning loop where the LLM 'probes' a next item, 'critiques' its own prediction/analysis, and 'reflects' to adjust its understanding step-by-step, mimicking human learning.

Breakthrough Assessment

7/10

Addresses critical gaps in LLM recommendation (collaborative signals and temporal evolution) with a logically sound prompting framework. Achieves strong performance (beating GPT-3.5 with 7B models) without fine-tuning, though it relies on inference-time complexity.

⚙️ Technical Details

Problem Definition

Setting: Sequential Recommendation

Inputs: User historical interaction sequence

Outputs: Next item to be interacted with

Pipeline Flow

Collaborative In-Context Retriever
LLM Inference (DRDT Principle)
Reranking (Implicit)

System Modules

Collaborative In-Context Demonstration Retriever

Gather sequences from the dataset that end with the same item as the target user's sequence to serve as ICL examples

Model or implementation: Retrieval Algorithm (specifics not detailed in snippet)

Reasoning Engine (DRDT)

Perform multi-aspect analysis (Divergent Thinking) and iterative self-correction (Dynamic Reflection)

Model or implementation: Vicuna-7b, Openchat-7b, or GPT-3.5 (Inference only)

Novel Architectural Elements

Integration of a collaborative retriever specifically for constructing ICL prompts in sequential recommendation
Dynamic reflection loop (Probe-Critique-Reflect) implemented purely via prompting to handle temporal preference evolution

Modeling

Base Model: Evaluated on 6 LLMs including Vicuna-7b, Openchat-7b, and GPT-Turbo-3.5

Training Method: Zero-shot and Few-shot prompting (Inference only)

Compute: Not reported in the paper

Comparison to Prior Work

vs. TALLRec: DRDT requires no fine-tuning (parameter-efficient vs. inference-only)
vs. Standard CoT: DRDT uses 'Divergent Thinking' (multi-aspect) instead of 'Convergent Thinking' (single path similarity) to avoid hallucinations
vs. Standard ICL: DRDT retrieves 'collaborative' examples (users with similar endings) rather than random segments

Limitations

Relies on the context window length of the LLM
Inference-time computational cost is likely higher due to the iterative reflection steps
Performance depends on the quality of the retrieved collaborative demonstrations

Reproducibility

Prompt principles are described (Divergent Thinking, Dynamic Reflection). Code URL is not provided in the text. Specific implementation of the retriever and prompt templates are described conceptually.

📊 Experiments & Results

Evaluation Setup

Sequential Recommendation on Amazon datasets

Benchmarks:

Beauty (Sequential Recommendation)
Sports (Sequential Recommendation)
Toys (Sequential Recommendation)

Metrics:

NDCG@10
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

The proposed DRDT strategy allows smaller 7B models (Vicuna-7b, Openchat-7b) to outperform GPT-Turbo-3.5 on three datasets (Beauty, Sports, Toys).
Integrating collaborative signals via the In-Context Retriever addresses the isolation of standard LLM prompts.
The Dynamic Reflection mechanism (probing, critiquing, reflecting) effectively manages noise and captures temporal preference evolution better than static CoT.
Results demonstrate that reasoning strategy (prompt design) is as critical as model size/training for sequential recommendation.

📚 Prerequisite Knowledge

Prerequisites

In-Context Learning (ICL)
Chain-of-Thought (CoT) prompting
Sequential Recommendation basics

Key Terms

ICL: In-Context Learning—prompting an LLM with examples in the input context to guide its behavior without weight updates

CoT: Chain-of-Thought—a prompting technique encouraging LLMs to generate intermediate reasoning steps before the final answer

Collaborative Signal: Patterns derived from the behavior of other users in the dataset who share similar preferences or histories

Divergent Thinking: A reasoning paradigm proposed here that analyzes user engagement from multiple aspects (price, color, etc.) rather than a single similarity path

Dynamic Reflection: An iterative process (Probe, Critique, Reflect) where the LLM evaluates its own predictions against temporal user feedback to refine its preference model

NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items

Hallucination: When an LLM generates plausible-sounding but factually incorrect or ungrounded explanations for user behavior