Large Language Models Make Sample-Efficient Recommender Systems

📝 Paper Summary

LLM-enhanced Recommender Systems Sample Efficiency in Recommendation

Laser demonstrates that LLMs can function as highly sample-efficient recommenders themselves or enhance conventional models by generating user/item knowledge that compensates for data sparsity.

Core Problem

Conventional recommendation models (CRMs) suffer from sample inefficiency, requiring massive amounts of interaction data to learn effective ID-based representations due to feature sparsity.

Why it matters:

Data sparsity remains a critical bottleneck in recommender systems, making it difficult to train effective models for new users or items with few interactions (cold start)
Collecting and annotating large-scale recommendation datasets is resource-intensive and costly
Existing methods relying on ID embeddings struggle to generalize when labeled data is scarce

Concrete Example: A standard CRM like DeepFM fails to predict user clicks accurately when trained on only 10% of the data because ID embeddings are under-trained. Laser compensates by using an LLM to generate text-based user profiles and item descriptions, providing rich semantic features even with minimal interaction history.

Key Novelty

Laser Framework (LLM-Enhanced Sample Efficiency)

Validates that LLMs themselves are inherently sample-efficient recommenders capable of few-shot preference inference using open-world knowledge
Proposes a hybrid paradigm where LLMs generate textual user/item knowledge that is encoded via Mixture-of-Experts (MoE) adapters to augment conventional ID-based models

Architecture

The overall framework of Laser illustrating two paradigms: Laser (LLM only) and Laser (LLM+CRM).

Evaluation Highlights

Laser (LLM only) matches or surpasses conventional models trained on the *entire* dataset while using only 10% of the training samples
Laser (LLM+CRM) matches full-dataset baselines using only 50% of the training data
Demonstrates superior sample efficiency on both BookCrossing and MovieLens-1M datasets compared to strong baselines like SIM and DIN

Breakthrough Assessment

7/10

Systematically quantifies the sample efficiency benefits of LLMs in recommendation. While the methods (LLM-as-recommender, LLM-as-feature-encoder) are known, the specific focus on sample efficiency and the hybrid MoE integration offers a solid empirical contribution.

⚙️ Technical Details

Problem Definition

Setting: Click-Through Rate (CTR) prediction formulated as binary classification

Inputs: User behavior history x, target item, context

Outputs: Predicted click probability y_hat in [0, 1]

Pipeline Flow

Data Processing (Input -> Textual Prompts)
Paradigm 1: Laser (LLM only) - Direct Inference
Paradigm 2: Laser (LLM+CRM) - Feature Generation & Hybrid Inference

System Modules

Prompt Constructor

Converts user history and item data into natural language templates

Model or implementation: Template-based formatting

LLM Recommender (Laser LLM only)

Directly predicts user preference (Yes/No) via causal language modeling

Model or implementation: Vicuna-13B

Knowledge Generator (Paradigm 2: Hybrid Inference)

Generates textual descriptions/reasoning for user preferences and item facts

Model or implementation: LLM (Vicuna-13B)

MoE Adapter (Paradigm 2: Hybrid Inference)

Encodes LLM-generated knowledge into dense vectors aligned with CRM space

Model or implementation: Parallel Mixture-of-Experts (MLP gating + MLP experts)

CRM Backbone (Paradigm 2: Hybrid Inference)

Predicts CTR using both original ID features and LLM-augmented vectors

Model or implementation: SIM (Search-based Interest Model)

Novel Architectural Elements

Use of parallel MoE adapters specifically to bridge frozen LLM knowledge representations with trainable CRM feature spaces
Decomposition of inference into cachable offline knowledge generation (LLM) and online prediction (CRM)

Modeling

Base Model: Vicuna-13B

Training Method: Instruction Tuning (for LLM only) and Supervised Learning (for CRM)

Objective Functions:

Purpose: Optimize LLM to generate 'Yes'/'No' for click prediction.

Formally: Causal Language Modeling loss minimizing negative log-likelihood of target tokens
Purpose: Optimize CRM with augmented features for binary classification.

Formally: Binary Cross-Entropy (LogLoss)

Adaptation: LoRA (implied by typical LLM tuning, though specific adapter details for the LLM tuning part are not explicitly detailed in text, MoE adapters are used for the hybrid model)

Training Data:

BookCrossing and MovieLens-1M datasets
Laser (LLM only) trained on 10% subset
Laser (LLM+CRM) trained on 50% subset
Baselines trained on 100% dataset

Key Hyperparameters:

computational_requirements: Inference complexity of Laser(LLM+CRM) is O(f(n,m)) equivalent to backbone CRM if cached

Compute: Laser (LLM+CRM) inference latency is comparable to standard CRMs due to offline caching. Laser (LLM only) has high latency.

Comparison to Prior Work

vs. SIM/DIN: Laser incorporates open-world knowledge from LLMs to handle data sparsity better
vs. P5: Laser focuses specifically on sample efficiency and proposes a hybrid CRM+LLM architecture rather than just fine-tuning T5
vs. TALLRec [not cited in paper]: Laser explores both pure LLM and hybrid CRM+LLM paradigms specifically for sample efficiency, whereas TALLRec focuses primarily on instruction tuning effectiveness

Limitations

Laser (LLM only) suffers from high inference latency, making it impractical for real-time industrial systems without distillation
Laser (LLM+CRM) is less sample-efficient than the pure LLM approach (requires 50% data vs 10% to match full baselines)
Reliance on the quality of textual metadata; performance may degrade if item descriptions are poor or missing

Reproducibility

Datasets are public (BookCrossing, MovieLens-1M). Code URL not explicitly provided in the text. Uses open-source Vicuna-13B model.

📊 Experiments & Results

Evaluation Setup

CTR prediction on public datasets

Benchmarks:

BookCrossing (CTR Prediction)
MovieLens-1M (CTR Prediction)

Metrics:

AUC
LogLoss
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Laser approaches achieve comparable or better performance than baselines trained on the full dataset, while using significantly less data.
BookCrossing / MovieLens-1M	AUC	Not explicitly reported in the paper	Not explicitly reported in the paper	Not explicitly reported in the paper

Main Takeaways

LLMs act as highly sample-efficient recommenders: Laser (LLM only) needs only ~10% of training data to match conventional models trained on 100% data.
Hybrid approach (LLM+CRM) improves CRM sample efficiency, matching full-data performance with ~50% of the data.
LLM inference latency can be mitigated in the hybrid approach by caching generated features, maintaining O(CRM) complexity.

📚 Prerequisite Knowledge

Prerequisites

Basics of Recommender Systems (CTR prediction, ID embeddings)
Large Language Models (Instruction Tuning, Causal Language Modeling)
Mixture of Experts (MoE) architecture

Key Terms

CRM: Conventional Recommendation Model—traditional deep learning models for recommendation (e.g., DeepFM, DIN) that rely heavily on ID-based embeddings

CTR: Click-Through Rate—the probability that a user will click on a recommended item

Sample Efficiency: The ability of a model to achieve high performance with a small amount of training data

Instruction Tuning: Fine-tuning LLMs on specific tasks formatted as natural language instructions

MoE: Mixture of Experts—a neural network architecture that uses a gating mechanism to select different sub-networks (experts) for different inputs

Laser: The proposed framework: LArge Language Models Make Sample-Efficient Recommender Systems

DIN: Deep Interest Network—a sequential recommendation model that uses attention mechanisms to capture user interests from behavior history

SIM: Search-based Interest Model—a sequential recommendation model that retrieves relevant user behaviors to model long-term interests