AdaRec: Adaptive Recommendation with LLMs via Narrative Profiling and Dual-Channel Reasoning

📝 Paper Summary

Recommender Systems LLM for Tabular Data

AdaRec transforms tabular user data into natural language narratives and employs a dual-channel process—combining historical peer similarity with causal feature discovery—to enable adaptable, high-performance recommendations.

Core Problem

Traditional recommender systems require extensive manual feature engineering and retraining for new distributions, while existing LLM-based methods are often computationally expensive agents or rely on static, non-adaptive text profiles.

Why it matters:

E-commerce platforms face dynamic user preferences where static models fail to adapt quickly without costly retraining
Current LLM approaches like RecMind or MINT lack interpretability or struggle with robustness to data shifts due to a lack of causal reasoning

Concrete Example: In a brand recommendation task, a standard model might recommend 'Brand A' based on simple correlation. AdaRec identifies that 'price sensitivity' is the causal factor for this specific user (via causal discovery) and contextualizes their spending history as 'budget-conscious' (via narrative profiling) to correctly recommend 'Brand B'.

Key Novelty

Dual-Channel Reasoning with Narrative Profiling

Transforms raw numerical features into context-aware 'narrative profiles' using statistical distributions, making data semantic for LLMs
Splits reasoning into two channels: 'Horizontal Alignment' (finding similar peers) and 'Vertical Attribution' (discovering causal features via FCI), combining social proof with causal drivers

Evaluation Highlights

+8% F1 improvement on Customer Response Prediction (few-shot) vs. LightGBM baseline
+19% F1 improvement in zero-shot settings vs. expert-crafted profiling strategies using Qwen-2.5
Achieves comparable performance to fully fine-tuned models on cross-task transfer (training on response prediction, testing on brand recommendation)

Breakthrough Assessment

8/10

Significantly outperforms strong tabular baselines (LightGBM) using an LLM approach, which is rare. The integration of causal inference (FCI) into the prompt structure effectively addresses the lack of reasoning in standard RAG.

⚙️ Technical Details

Problem Definition

Setting: Personalized recommendation on tabular data (Binary Classification and Top-K Ranking)

Inputs: User feature vector x_theta (d-dimensional tabular data)

Outputs: Recommendation R (binary label or ranked list of brands)

Pipeline Flow

Narrative Profiler: User Vector -> Textual Profile
Retrieval Channel: Target User -> Top-k Similar Historical Users
Causal Channel: Reference Set -> FCI Algorithm -> Causal Features
Reasoning Engine: Text Profile + Similar Cases + Causal Features -> LLM -> Prediction

System Modules

Narrative Profiler

Convert numerical features into context-aware text descriptions using global statistics

Model or implementation: LLM (Claude-3.5/Qwen-2.5/Llama-3.1)

Historical Case Discovery

Identify similar users (peers) and build a reference set for causal discovery

Model or implementation: Cosine Similarity

Causal Structure Learning

Identify features that causally influence the target variable to focus the LLM's reasoning

Model or implementation: Fast Causal Inference (FCI) Algorithm

Structured Reasoner

Generate the final recommendation by synthesizing narrative, peer patterns, and causal factors

Model or implementation: LLM (Claude-3.5/Qwen-2.5/Llama-3.1)

Novel Architectural Elements

Integration of an explicit Causal Structure Learning module (FCI) within the prompt construction pipeline to filter irrelevant features before generation
Bivariate reasoning paradigm splitting context into 'Horizontal' (peer) and 'Vertical' (causal) streams

Modeling

Base Model: Claude-3.5-Sonnet, Llama-3.1-70B, Qwen-2.5-32B

Training Method: Supervised Fine-Tuning (SFT) and Kahneman-Tversky Optimization (KTO)

Objective Functions:

Purpose: Optimize for binary classification tasks.

Formally: Standard Cross-Entropy Loss (SFT)
Purpose: Optimize for ranking/preference tasks.

Formally: Kahneman-Tversky Optimization (KTO) loss

Adaptation: Lightweight fine-tuning (SFT/KTO)

Training Data:

Synthetic data generated by AdaRec itself is used for fine-tuning (distillation approach)

Key Hyperparameters:

k_shots: 5
eta1_similarity_pool: 2000
eta2_causal_pool: 1000
+ 5 more
p_causal_features: 15
significance_level_alpha: 0.1
learning_rate_sft: 1e-4
learning_rate_kto: 5e-5
epochs: 2

Compute: 4 NVIDIA V100 GPUs

Comparison to Prior Work

vs. LightGBM: Uses semantic understanding and in-context learning rather than just feature interaction
vs. RecMind: Uses structured dual-channel reasoning (causal + similarity) rather than expensive self-planning
vs. MINT: Uses dynamic causal feature selection via FCI rather than static retrieval queries
+ 1 more
vs. TALLRec [not cited in paper]: TALLRec uses instruction tuning on large datasets; AdaRec focuses on few-shot in-context learning with causal guidance

Limitations

Dependence on the quality of the reference set for causal discovery (FCI performance)
Computational cost of running FCI and LLM inference compared to lightweight ML models like LightGBM
Performance gains vary by LLM backbone (Claude-3.5 vs Llama-3.1)

Reproducibility

Code: https://anonymous.4open.science/r/AdaRec-CE5C

Code available at https://anonymous.4open.science/r/AdaRec-CE5C. Uses public e-commerce datasets (details implied but specific source names not explicitly cited in snippet). Hyperparameters provided.

📊 Experiments & Results

Evaluation Setup

Few-shot and Zero-shot prediction on tabular e-commerce data

Benchmarks:

Customer Response Prediction (Binary Classification)
Brand Recommendation (Top-K Ranking (Selection from 17 brands))

Metrics:

F1 score
Precision
Recall
Expected CTR
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
AdaRec significantly outperforms both traditional ML and LLM baselines in few-shot settings.
Customer Response Prediction (5-shot)	F1	86.67	94.33	+7.66
Brand Recommendation (5-shot)	CTR	9.0	10.3	+1.3
Narrative profiling demonstrates massive gains over expert profiling in zero-shot scenarios, proving better adaptability.
Customer Response Prediction (0-shot)	F1	55.58	74.13	+18.55
The model shows strong cross-task generalization, retaining performance when fine-tuned on one task and tested on another.
Brand Recommendation (5-shot)	CTR	9.7	9.7	0.0

Experiment Figures

Comparison of Expert Profiling vs. Narrative Profiling text outputs

Ablation study of different components (Narrative only vs +Causal vs +History)

Main Takeaways

AdaRec outperforms LightGBM by ~8% in few-shot settings, despite LightGBM seeing 1M+ samples, demonstrating extreme data efficiency
Narrative profiling is far more robust than expert profiling in zero-shot settings (+19%), as it adapts to data distributions automatically
The dual-channel architecture is robust: ablation shows removing historical patterns causes the largest drop, while causal features provide moderate but essential gains for precision
Cross-task evaluation shows the learned representations are generalizable, eliminating the need for task-specific retraining

📚 Prerequisite Knowledge

Prerequisites

Collaborative Filtering concepts (similarity, peers)
Causal Discovery (Fast Causal Inference algorithm)
In-Context Learning with LLMs

Key Terms

Narrative Profiling: Converting numerical user features into natural language descriptions (e.g., 'high spending' -> 'top 10% spender') based on statistical distributions

FCI: Fast Causal Inference—an algorithm used to discover causal relationships and selection bias in data, used here to identify which features actually drive the target outcome

Dual-Channel Reasoning: An architecture combining 'Horizontal' retrieval (finding similar users) and 'Vertical' analysis (identifying causal features) to guide the LLM

Mutual Information: A measure of the mutual dependence between two variables, used here to weight feature importance

KTO: Kahneman-Tversky Optimization—a loss function for aligning LLMs to human preferences, used here for fine-tuning on the ranking task

In-context Learning: Providing examples and instructions in the prompt to guide the LLM's behavior without updating model weights