One Model for All: Large Language Models are Domain-Agnostic Recommendation Systems

📝 Paper Summary

Sequential Recommendation Cross-domain Recommendation LLM for Recommendation

LLM-Rec leverages the world knowledge and semantic understanding of pre-trained Large Language Models to unify user behaviors across multiple domains into a single sequence, addressing data sparsity and cold-start problems without complex domain-specific architectures.

Core Problem

Traditional multi-domain recommendation systems struggle with data sparsity and cold-start issues because they rely on ID-based representations that lack semantic meaning and fail to align items across domains.

Why it matters:

Current cross-domain methods require complex, rigid architectures (e.g., pair-wise links) that scale poorly to many domains
ID-based methods cannot transfer semantic knowledge; a user's interest in 'running shoes' in one domain doesn't naturally map to 'sports drinks' in another without explicit overlap
Existing sequential models often fail to capture long-term dependencies or semantic correlations between diverse user interests

Concrete Example: In a preliminary study, simply concatenating item IDs from five different domains and feeding them into SASRec resulted in performance degradation compared to single-domain models, proving that ID-based methods fail to capture cross-domain semantic connections.

Key Novelty

LLM-Rec: Domain-Agnostic LLM Framework

Treats multi-domain recommendation as a text-to-text problem by converting item titles into text and concatenating them into a single user history sentence
Uses a single pre-trained LLM backbone to encode both user history and candidate items, relying on the LLM's internal 'world knowledge' to bridge semantic gaps between domains
Demonstrates that larger model sizes (scaling laws) and instruction tuning (LoRA) significantly benefit recommendation performance, unlike traditional ID-based models

Architecture

The overall framework of LLM-Rec, illustrating how user behaviors from different domains are concatenated into a text sequence, processed by various LLM backbones (Encoder-only, Decoder-only, Encoder-Decoder), and used for next-item prediction.

Evaluation Highlights

Outperforms state-of-the-art baselines like SASRec and UniSRec on 5 diverse datasets, with gains particularly strong in sparse/cold-start scenarios
Larger models yield better performance: scaling from 125M to 6.7B parameters consistently improves recommendation accuracy, confirming NLP scaling laws apply here
Fine-tuning with LoRA achieves comparable or better results than full parameter tuning while requiring significantly fewer trainable parameters

Breakthrough Assessment

7/10

Strong empirical validation of LLMs for multi-domain recommendation without complex graph/task structures. Successfully applies NLP scaling laws to RecSys, though the architectural innovation is primarily the application of existing LLMs to a new setting.

⚙️ Technical Details

Problem Definition

Setting: Multi-domain Sequential Recommendation

Inputs: A user u's mixed sequence of interactions s_u = (v_1, v_2, ..., v_L') from all available domains, represented by item titles

Outputs: Probability of the next item v_{L'+1} in a target domain D^T

Pipeline Flow

Input Construction (Textualization)
LLM Backbone Encoding
Score Prediction

System Modules

Input Construction

Convert item IDs to text titles and concatenate user history into a single sequence

Model or implementation: Tokenizer (specific to LLM backbone)

LLM Backbone

Generate dense vector representations for the user (based on history) and the item (based on title)

Model or implementation: Various: BERT (Encoder-only), OPT (Decoder-only), FLAN-T5 (Encoder-Decoder)

Prediction Head

Calculate relevance score between user and candidate item

Model or implementation: Dot product

Novel Architectural Elements

Unified text-based input format that mixes heterogeneous domain items into a single natural language sequence
Application of decoder-only (OPT) and encoder-decoder (FLAN-T5) architectures for embedding-based recommendation (rather than generative recommendation)

Modeling

Base Model: Evaluated multiple: BERT (Medium/Base/Large), OPT (125M to 6.7B), FLAN-T5 (Small to XL)

Training Method: Supervised Fine-Tuning (SFT) with Cross-Entropy Loss

Objective Functions:

Purpose: Maximize the probability of the ground-truth next item while minimizing probabilities of negative samples.

Formally: Cross-entropy loss L = - ∑ log(σ(u_j · v_j)) for positive samples and log(1 - σ(u_j · v_neg)) for negative samples.

Adaptation: LoRA (Low-Rank Adaptation) and Full Fine-tuning compared

Trainable Parameters: Varies by setting; LoRA trains <1% of parameters

Key Hyperparameters:

negative_samples: 1
batch_size: Not explicitly reported in the paper
learning_rate: Not explicitly reported in the paper

Compute: Experiments conducted on models up to 6.7B parameters

Comparison to Prior Work

vs. SASRec: LLM-Rec uses text instead of IDs and a single model for all domains
vs. UniSRec: LLM-Rec processes multi-domain data in a single stage rather than pre-train/fine-tune
vs. Recformer: LLM-Rec investigates decoder-only and larger scale models (up to 6.7B) rather than just encoder-only architectures
+ 1 more
vs. C2DSR: LLM-Rec avoids complex pair-wise domain linking objectives (A_N^2 complexity)

Limitations

Computational cost of inference with 6.7B parameter models is significantly higher than ID-based embeddings
Requires item titles to be meaningful; may fail if text descriptions are poor quality
Maximum sequence length of LLMs may limit the length of user history considered
No specific details provided on latency or real-time serving feasibility

Reproducibility

No code URL provided in the paper. Datasets are real-world but specific sources/preprocessing details are standard. Hyperparameters like batch size and learning rate are missing from the text.

📊 Experiments & Results

Evaluation Setup

Next-item prediction on mixed multi-domain sequences

Benchmarks:

Comparison 1 (5 Domains) (Multi-domain Sequential Recommendation)

Metrics:

NDCG@10
Hit Rate@10 (HR@10)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Ablation and model scaling experiments demonstrate the impact of model size and fine-tuning strategies.
5-Domain Average	NDCG@10	0.3347	0.4578	+0.1231
5-Domain Average	NDCG@10	0.2650	0.2710	+0.0060

Experiment Figures

Performance comparison (NDCG@10) of SASRec trained on single domains vs. SASRec trained on mixed multi-domain data.

Main Takeaways

Simply merging interaction data (ID-based) from multiple domains degrades performance, but processing it as text via LLMs improves performance, proving LLMs bridge semantic gaps.
Scaling Law holds: Performance consistently improves as the model size increases from 125M to 6.7B parameters.
LLMs are particularly effective for Cold-Start items and unpopular items, relying on semantic understanding rather than collaborative signals.
Decoder-only architectures (like OPT) generally perform well, and LoRA is an effective and efficient tuning strategy compared to full fine-tuning.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (Self-Attention, Encoder/Decoder)
Sequential Recommendation (SASRec, BERT4Rec)
Large Language Models (BERT, OPT, FLAN-T5)
Parameter-Efficient Fine-Tuning (LoRA)

Key Terms

Sequential Recommendation: Predicting the next likely item a user will interact with based on their chronological history of past interactions

Cold-start: The difficulty of recommending items to new users or recommending new items that have few or no prior interactions

LoRA: Low-Rank Adaptation—a technique to fine-tune large models by training small rank-decomposition matrices while keeping the main model weights frozen

Encoder-only: Transformer architectures like BERT that use bi-directional attention, suitable for understanding context but not generating text

Decoder-only: Transformer architectures like GPT/OPT that use uni-directional attention (causal), typically used for text generation

Encoder-Decoder: Transformer architectures like T5 that process input with an encoder and generate output with a decoder

SASRec: Self-Attentive Sequential Recommendation—a standard baseline model that uses a Transformer encoder to model user interaction sequences based on item IDs