Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai
arXiv
(2024)
RecommendationP13N
📝 Paper Summary
Recommender SystemsLarge Language Model IntegrationRepresentation Learning
LEARN improves recommendations by using frozen LLMs as item encoders and a specialized transformer to align open-world semantic knowledge with collaborative user preferences, avoiding the high cost of text-based fine-tuning.
Core Problem
Traditional recommender systems rely on ID embeddings that lack semantic understanding, while integrating LLMs via 'Rec-to-LLM' (converting history to text) is computationally prohibitive and causes catastrophic forgetting.
Why it matters:
Industrial constraints (e.g., 800+ item histories) make standard LLM fine-tuning or inference unaffordable ($O(N^2)$ complexity on long contexts)
ID-based methods fail in cold-start scenarios and cannot transfer knowledge across domains like pre-trained models in CV or NLP
Fine-tuning LLMs on collaborative data often degrades their general open-world reasoning capabilities (catastrophic forgetting)
Concrete Example:In a short video platform where a user watches ~800 videos weekly, converting this multi-month history into a text prompt for an LLM exceeds context windows and compute budgets. Existing 'Rec-to-LLM' methods fail to handle this scale efficiently.
Inverts the paradigm from 'Rec-to-LLM' to 'LLM-to-Rec': instead of forcing rec data into LLM formats, it extracts semantic vectors from a frozen LLM and adapts them to recommendation tasks
Separates content extraction (via frozen LLM) from preference alignment (via a trainable transformer), preserving open-world knowledge while learning collaborative patterns
Uses a twin-tower architecture where the item encoder shares weights with the user tower, optimized via contrastive learning on dense user actions
Architecture
The overall LEARN framework consisting of a User Tower and Item Tower.
Evaluation Highlights
Achieves an average 13.95% improvement in Recall@10 across six Amazon Review datasets compared to state-of-the-art baselines
Successfully deployed in a real large-scale industrial short video platform (verified via online A/B testing)
State-of-the-art performance in three metrics across six public datasets (Amazon Reviews)
Breakthrough Assessment
8/10
Significant for proposing a scalable 'LLM-to-Rec' architecture that works in industrial settings (proven by A/B tests) and achieving double-digit gains on public benchmarks, effectively addressing the efficiency-effectiveness trade-off in LLM4Rec.
Compute: LLM parameters frozen to reduce burden; specific GPU hours not reported in the paper
Comparison to Prior Work
vs. Rec-to-LLM (TALLRec, LlamaRec): LEARN adapts LLM knowledge to Rec (LLM-to-Rec) using a frozen encoder + trainable adapter, rather than fine-tuning the LLM on text-formatted history
vs. ID-based (SASRec): LEARN utilizes semantic text content, enabling better generalization and cold-start handling
Limitations
Depends on the quality of textual descriptions for items
Frozen LLM may still be computationally heavy for inference compared to pure ID embeddings (though lighter than fine-tuning)
Recency-based sampling assumption may not hold for all user interest types
Reproducibility
Code availability is not provided. Industrial dataset is proprietary. Amazon Review datasets are public. Prompts are described conceptually in Figure 3. Hyperparameters for sampling and embedding dimensions are provided.
📊 Experiments & Results
Evaluation Setup
Sequential recommendation on industrial and public datasets
Benchmarks:
Amazon Reviews (Sequential Recommendation)
Industrial Dataset (Short Video Recommendation) [New]
Metrics:
Recall@10
NDCG@10
Statistical methodology: Not explicitly reported in the paper
Experiment Figures
Conceptual comparison between 'Rec-to-LLM' and 'LLM-to-Rec' (LEARN).
Main Takeaways
The 'LLM-to-Rec' adaptation strategy outperforms 'Rec-to-LLM' methods in efficiency and effectiveness for industrial scale applications.
Freezing the LLM and using a separate alignment module (PAL) effectively preserves open-world knowledge while adapting to collaborative tasks.
The method achieves substantial gains (+13.95% Recall@10) on public benchmarks, validating the architecture's superiority over standard ID-based and BERT-based baselines.
Online A/B testing confirms the profitability and practical viability of the framework in a real-world short video platform.
📚 Prerequisite Knowledge
Prerequisites
Recommender Systems (ID vs. Content embeddings)
Transformer architecture (Self-attention vs. Causal attention)
Contrastive Learning
Large Language Models (Fine-tuning vs. Freezing)
Key Terms
Rec-to-LLM: Adapting recommendation data into textual conversation formats to fine-tune LLMs (the traditional/expensive approach)
LLM-to-Rec: Adapting knowledge from LLMs to recommendation systems by using LLMs as feature extractors for standard recommendation models
CEX: Content EXtraction module—uses a frozen pre-trained LLM to convert item text into content embeddings
PAL: Preference ALignment module—a transformer that maps content embeddings to user preference embeddings
Catastrophic Forgetting: The tendency of LLMs to lose pre-trained general knowledge when fine-tuned heavily on a specific downstream task
Cold-start: The challenge of recommending items to users or items with little to no historical interaction data
Dense all action loss: A contrastive loss function that utilizes all items in a target sequence as positive samples against negatives