LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

📝 Paper Summary

Recommender Systems Large Language Model Integration Representation Learning

LEARN improves recommendations by using frozen LLMs as item encoders and a specialized transformer to align open-world semantic knowledge with collaborative user preferences, avoiding the high cost of text-based fine-tuning.

Core Problem

Traditional recommender systems rely on ID embeddings that lack semantic understanding, while integrating LLMs via 'Rec-to-LLM' (converting history to text) is computationally prohibitive and causes catastrophic forgetting.

Why it matters:

Industrial constraints (e.g., 800+ item histories) make standard LLM fine-tuning or inference unaffordable ($O(N^2)$ complexity on long contexts)
ID-based methods fail in cold-start scenarios and cannot transfer knowledge across domains like pre-trained models in CV or NLP
Fine-tuning LLMs on collaborative data often degrades their general open-world reasoning capabilities (catastrophic forgetting)

Concrete Example: In a short video platform where a user watches ~800 videos weekly, converting this multi-month history into a text prompt for an LLM exceeds context windows and compute budgets. Existing 'Rec-to-LLM' methods fail to handle this scale efficiently.

Key Novelty

LLM-driven KnowlEdge Adaptive RecommeNdation (LEARN)

Inverts the paradigm from 'Rec-to-LLM' to 'LLM-to-Rec': instead of forcing rec data into LLM formats, it extracts semantic vectors from a frozen LLM and adapts them to recommendation tasks
Separates content extraction (via frozen LLM) from preference alignment (via a trainable transformer), preserving open-world knowledge while learning collaborative patterns
Uses a twin-tower architecture where the item encoder shares weights with the user tower, optimized via contrastive learning on dense user actions

Architecture

The overall LEARN framework consisting of a User Tower and Item Tower.

Evaluation Highlights

Achieves an average 13.95% improvement in Recall@10 across six Amazon Review datasets compared to state-of-the-art baselines
Successfully deployed in a real large-scale industrial short video platform (verified via online A/B testing)
State-of-the-art performance in three metrics across six public datasets (Amazon Reviews)

Breakthrough Assessment

8/10

Significant for proposing a scalable 'LLM-to-Rec' architecture that works in industrial settings (proven by A/B tests) and achieving double-digit gains on public benchmarks, effectively addressing the efficiency-effectiveness trade-off in LLM4Rec.

⚙️ Technical Details

Problem Definition

Setting: Sequential Recommendation (Next Item Prediction)

Inputs: User history interaction sequence U_hist (chronological item sequence) and candidate items

Outputs: User embedding E_user and Item embeddings E_item used to compute similarity scores for ranking

Pipeline Flow

Input Processing (Prompt Construction)
Content Extraction (Frozen LLM)
Preference Alignment (Trainable Transformer)
Similarity Calculation (Contrastive)

System Modules

Prompt Constructor

Converts item data into concise textual descriptions

Model or implementation: Template-based

CEX (Content EXtraction)

Encodes item text into semantic vectors while preserving open-world knowledge

Model or implementation: Pretrained LLM (Frozen) + Average Pooling

PAL (Preference ALignment)

Models sequential user interests based on content embeddings

Model or implementation: 12-layer Transformer (BERT-base config) with Causal Attention

Novel Architectural Elements

Decoupled architecture: Frozen LLM for features (CEX) + Trainable Transformer for alignment (PAL), avoiding end-to-end LLM fine-tuning
Twin-tower design where Item Tower shares weights/architecture (specifically 'ItemTower(a)') with the User Tower to align spaces

Modeling

Base Model: Pretrained LLM (Specific variant not named in text, generic 'Pretrained LLM')

Training Method: Contrastive Learning (Self-supervised)

Objective Functions:

Purpose: Maximize similarity between user embedding and relevant target item embeddings while minimizing similarity to negatives.

Formally: Dense all action loss (sampling N_h history and N_t target items).

Training Data:

Large-scale industrial dataset (Short video platform)
Two-stage sampling: Random sampling then weighted sampling based on recency

Key Hyperparameters:

user_embedding_dim: 64
sample_weight_alpha: 10
sample_weight_beta: 10000
+ 3 more
N_h (history samples): 10
N_t (target samples): 10
transformer_layers: 12 (BERT-base config)

Compute: LLM parameters frozen to reduce burden; specific GPU hours not reported in the paper

Comparison to Prior Work

vs. Rec-to-LLM (TALLRec, LlamaRec): LEARN adapts LLM knowledge to Rec (LLM-to-Rec) using a frozen encoder + trainable adapter, rather than fine-tuning the LLM on text-formatted history
vs. ID-based (SASRec): LEARN utilizes semantic text content, enabling better generalization and cold-start handling

Limitations

Depends on the quality of textual descriptions for items
Frozen LLM may still be computationally heavy for inference compared to pure ID embeddings (though lighter than fine-tuning)
Recency-based sampling assumption may not hold for all user interest types

Reproducibility

Code availability is not provided. Industrial dataset is proprietary. Amazon Review datasets are public. Prompts are described conceptually in Figure 3. Hyperparameters for sampling and embedding dimensions are provided.

📊 Experiments & Results

Evaluation Setup

Sequential recommendation on industrial and public datasets

Benchmarks:

Amazon Reviews (Sequential Recommendation)
Industrial Dataset (Short Video Recommendation) [New]

Metrics:

Recall@10
NDCG@10
Statistical methodology: Not explicitly reported in the paper

Experiment Figures

Conceptual comparison between 'Rec-to-LLM' and 'LLM-to-Rec' (LEARN).

Main Takeaways

The 'LLM-to-Rec' adaptation strategy outperforms 'Rec-to-LLM' methods in efficiency and effectiveness for industrial scale applications.
Freezing the LLM and using a separate alignment module (PAL) effectively preserves open-world knowledge while adapting to collaborative tasks.
The method achieves substantial gains (+13.95% Recall@10) on public benchmarks, validating the architecture's superiority over standard ID-based and BERT-based baselines.
Online A/B testing confirms the profitability and practical viability of the framework in a real-world short video platform.

📚 Prerequisite Knowledge

Prerequisites

Recommender Systems (ID vs. Content embeddings)
Transformer architecture (Self-attention vs. Causal attention)
Contrastive Learning
Large Language Models (Fine-tuning vs. Freezing)

Key Terms

Rec-to-LLM: Adapting recommendation data into textual conversation formats to fine-tune LLMs (the traditional/expensive approach)

LLM-to-Rec: Adapting knowledge from LLMs to recommendation systems by using LLMs as feature extractors for standard recommendation models

CEX: Content EXtraction module—uses a frozen pre-trained LLM to convert item text into content embeddings

PAL: Preference ALignment module—a transformer that maps content embeddings to user preference embeddings

Catastrophic Forgetting: The tendency of LLMs to lose pre-trained general knowledge when fine-tuned heavily on a specific downstream task

Cold-start: The challenge of recommending items to users or items with little to no historical interaction data

Dense all action loss: A contrastive loss function that utilizes all items in a target sequence as positive samples against negatives