LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering

📝 Paper Summary

Collaborative Filtering (CF) Knowledge Distillation from LLMs Recommender Systems

LLM-KT improves collaborative filtering models by training them to reconstruct LLM-generated user profile embeddings within their internal layers, rather than using these features as direct inputs.

Core Problem

Existing methods for transferring knowledge from LLMs to recommender systems typically require models that accept textual features as input, excluding many traditional Collaborative Filtering (CF) architectures.

Why it matters:

Traditional CF models (like Matrix Factorization) often struggle to capture nuanced user preferences from interactions alone but cannot natively process the rich reasoning chains generated by LLMs.
Current LLM enhancement methods (e.g., KAR, LLM-CF) are limited to context-aware models that handle input features, leaving a gap for improving simpler, widely-used CF baselines.
Direct use of LLMs for inference is prohibitively expensive, necessitating efficient knowledge transfer methods.

Concrete Example: A Matrix Factorization model (NeuMF) cannot take an LLM-generated text summary like 'User loves 90s sci-fi' as input because it only accepts user/item IDs. Current methods would fail or require architecture changes, whereas LLM-KT injects this knowledge by forcing the model's internal embeddings to match the text summary's embedding.

Key Novelty

Internal Feature Reconstruction as Knowledge Transfer

Instead of feeding LLM features as input, LLM-KT treats the LLM-generated user profile embedding as a target for the CF model to reconstruct inside its hidden layers.
This approach acts as a 'side quest' (pretext task) during training: the model learns to organize its internal user representations to align with the semantic richness of LLM profiles without changing its inference architecture.

Architecture

The LLM-KT framework architecture integrated with RecBole, showing the pipeline from dataset to model training.

Evaluation Highlights

+21% improvement in NDCG@10 on Amazon CD and Vinyl dataset for the SimpleX baseline compared to the base model.
Consistent improvements across three diverse baselines (NeuMF, SimpleX, MultVAE) on both MovieLens and Amazon datasets.
Achieves competitive performance with state-of-the-art KAR on context-aware models (DeepFM, DCN) while being applicable to a broader range of architectures.

Breakthrough Assessment

6/10

A clever, versatile engineering framework that extends LLM benefits to legacy CF models. While the core idea (distillation) is known, the specific application to internal layer reconstruction for model-agnosticism is valuable.

⚙️ Technical Details

Problem Definition

Setting: Enhancing Collaborative Filtering models with side information from LLMs without architectural modification.

Inputs: User-item interaction history

Outputs: Ranked list of items (or CTR prediction)

Pipeline Flow

LLM Profile Generation
Profile Embedding
Knowledge Transfer Training (Phase 1)
Fine-tuning (Phase 2)

System Modules

Profile Generator (Preprocessing)

Generate natural language user profiles from interaction history

Model or implementation: LLM (flexible, prompt-based)

Profile Embedder (Preprocessing)

Convert text profiles into dense vectors

Model or implementation: text-embedding-ada-002

Dimensionality Reducer (Preprocessing)

Align embedding dimensions with CF model internal layer

Model or implementation: UMAP (non-learnable in pipeline)

CF Model Wrapper

Train CF model with auxiliary reconstruction loss

Model or implementation: Any CF model (NeuMF, SimpleX, MultVAE, etc.)

Novel Architectural Elements

Injection of auxiliary reconstruction loss at arbitrary intermediate layers of a black-box CF model via a 'Hook Manager' mechanism
Use of UMAP for dimensionality alignment of external knowledge embeddings to internal model states instead of learnable linear projections

Modeling

Base Model: Varies (NeuMF, SimpleX, MultVAE, DCN, DeepFM)

Training Method: Two-phase training: (1) Knowledge Transfer with auxiliary loss, (2) Fine-tuning on main task only

Objective Functions:

Purpose: Jointly optimize recommendation and profile reconstruction.

Formally: L = (1-alpha) * L_model + alpha * L_KT
Purpose: Reconstruct the transformed LLM profile embedding from the internal model state.

Formally: L_KT = RMSE(Z_u, Trans(P_u))

Training Data:

Datasets: Amazon 'CD and Vinyl', MovieLens (ML-1M)
Split: 70-10-20% time-ordered split

Key Hyperparameters:

alpha: Weight for reconstruction loss (alpha in [0, 1])
epochs: 70 (N=70, split into N/2 transfer + N/2 finetuning)
reconstruction_loss_type: RMSE (found to be best)

Compute: Not reported in the paper

Comparison to Prior Work

vs. KAR: LLM-KT modifies internal representations via loss rather than inputs, allowing use with non-context-aware models (e.g., Matrix Factorization).
vs. LLM-CF: Embeds knowledge into intermediate layers instead of input features.
vs. RLMRec [not cited in paper]: RLMRec aligns representation spaces via contrastive learning; LLM-KT uses direct reconstruction (RMSE) at specific layers.

Limitations

Relies on the quality of LLM-generated profiles; poor profiles may hurt performance.
UMAP transformation is non-learnable in the current setup, potentially limiting alignment flexibility.
Requires access to intermediate layers, which might be complex for some opaque model implementations.
Two-phase training adds complexity compared to standard end-to-end training.

Reproducibility

Code: https://github.com/a250/LLMRecSys_with_KnowledgeDistilation/tree/distil_framework

Code is publicly available at the provided GitHub link. The paper specifies the exact LLM embedding model (text-embedding-ada-002) and prompt strategy. Exact hyperparameters (learning rates, batch sizes) for baselines are not detailed in the text but implied to follow standard RecBole defaults or prior work.

📊 Experiments & Results

Evaluation Setup

Reranking task for general CF models; CTR prediction for context-aware models.

Benchmarks:

Amazon CD and Vinyl (Recommendation / Reranking)
MovieLens-1M (Recommendation / Reranking)

Metrics:

NDCG@10
Recall@10
Hits@10
AUC-ROC (for CTR tasks)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
General CF Model Results: LLM-KT consistently improves performance over baselines on both datasets.
Amazon CDs	NDCG@10	0.0526	0.0637	+0.0111
ML-1M	NDCG@10	0.4123	0.4285	+0.0162
ML-1M	Recall@10	0.2715	0.2798	+0.0083
Context-Aware Model Results: LLM-KT performs comparably to the state-of-the-art KAR method.
ML-1M	AUC	0.9125	0.9132	+0.0007
Amazon CDs	AUC	0.7845	0.7853	+0.0008

Main Takeaways

LLM-KT provides consistent improvements (up to ~21%) for traditional CF models that cannot naturally accept text inputs.
The method is versatile, working across Matrix Factorization (NeuMF), Contrastive (SimpleX), and VAE-based (MultVAE) architectures.
For context-aware models that CAN take text inputs, LLM-KT's internal reconstruction strategy is competitive with direct input injection (KAR), suggesting implicit alignment is as effective as explicit feature use.

📚 Prerequisite Knowledge

Prerequisites

Collaborative Filtering (CF) basics
Knowledge Distillation / Transfer Learning concepts
Basic understanding of embedding spaces

Key Terms

CF: Collaborative Filtering—methods that predict user preferences based on past interactions of similar users.

LLM-generated profile: A text summary of a user's preferences generated by an LLM based on their interaction history.

pretext task: An auxiliary training objective (here, reconstructing the profile embedding) used to help the model learn better representations for the main task.

NeuMF: Neural Matrix Factorization—a neural network-based CF model combining generalized matrix factorization and multi-layer perceptrons.

SimpleX: A contrastive learning-based CF model designed for efficiency.

MultVAE: A Variational Autoencoder-based CF model for implicit feedback.

UMAP: Uniform Manifold Approximation and Projection—a dimensionality reduction technique used here to align profile embeddings with model layer dimensions.

KAR: Knowledge Adaptation for Recommendation—a state-of-the-art baseline that uses LLM reasoning as input features.

NDCG: Normalized Discounted Cumulative Gain—a ranking metric that values correct recommendations higher when they appear earlier in the list.

CTR: Click-Through Rate—the ratio of users who click on a specific link to the number of total users who view it.