LLMRec: Large Language Models with Graph Augmentation for Recommendation

📝 Paper Summary

LLM-based Recommendation Graph Augmentation for Recommender Systems Side Information Enhancement

LLMRec uses Large Language Models to augment interaction graphs by predicting user-item edges and generating node attributes, then refines this data via noise pruning and masked autoencoders to improve collaborative filtering.

Core Problem

Collaborative filtering suffers from sparse implicit feedback, and incorporating side information often introduces noise, heterogeneity, or low-quality data that hinders accurate user preference modeling.

Why it matters:

Sparse interaction data limits the effectiveness of Graph Neural Networks (GNNs) in capturing user preferences
Side information (e.g., text descriptions) on platforms like Netflix is often noisy, incomplete, or irrelevant to the collaborative filtering task
Existing augmentation methods (like contrastive learning) may not fully leverage the semantic richness and reasoning capabilities of LLMs

Concrete Example: In a micro-video recommender, irrelevant textual titles that fail to capture the video's content introduce noise. Similarly, privacy concerns may lead to missing user profiles. Standard models struggle to learn from this incomplete or noisy heterogeneous data.

Key Novelty

LLM-based Graph Augmentation (Edges + Attributes + Profiles)

Uses an LLM as a knowledge-aware sampler to predict likely positive/negative user-item interactions (edges) from a candidate pool based on natural language reasoning
Generates missing user profiles and enhances item attributes using the LLM's world knowledge to bridge heterogeneous feature gaps
Employs a robustification mechanism with noise pruning for edges and MAE-based feature enhancement to filter out unreliable augmented data

Architecture

The overall LLMRec framework. It illustrates the three-step augmentation process: (1) LLM-based implicit feedback augmentation (Edge Augmentation), (2) LLM-based side information augmentation (Node Attribute Augmentation), and (3) The robust training pipeline with noise pruning and MAE enhancement.

Evaluation Highlights

Outperforms state-of-the-art baselines (e.g., MMSSL, LATTICE) on Netflix and MovieLens datasets
Achieves significant improvements in Recall@10 and NDCG@10 compared to base models like LightGCN and various augmentation strategies
Demonstrates robustness to data sparsity and noise through ablation studies and varying training data ratios

Breakthrough Assessment

7/10

Novel application of LLMs specifically for *graph augmentation* (both edges and features) in standard CF pipelines, addressing data quality issues directly. Strong results, though it relies on standard LLM inference rather than a new architecture.

⚙️ Technical Details

Problem Definition

Setting: Top-N Recommendation with Implicit Feedback and Side Information

Inputs: User set U, Item set I, sparse implicit feedback graph E+, and side information features F

Outputs: Ranked list of items for each user, predicted by learning collaborative graph embeddings E

Pipeline Flow

LLM Edge Augmentation: LLM selects positive/negative items from candidates
LLM Attribute Augmentation: LLM generates user profiles and item attributes
Feature Encoding & Injection: Encode text to vectors, project, and inject into GNN
Robust Optimization: Train with BPR loss, noise pruning, and MAE feature reconstruction

System Modules

LLM Sampler (Data Augmentation)

Selects likely positive/negative items from a candidate pool to reinforce interaction edges

Model or implementation: LLM (exact model not specified in snippet, likely OpenAI GPT-series based on typical usage)

LLM Profiler/Encoder (Data Augmentation)

Generates user profiles and item attributes, then encodes them into feature vectors

Model or implementation: LLM

Recommender Encoder

Learns user/item representations using GNNs on the augmented graph

Model or implementation: LightGCN (as base GNN encoder)

Denoising Module

Filters unreliable edges and reconstructs masked features

Model or implementation: Sort-and-Drop (Edges), MAE (Features)

Novel Architectural Elements

Integration of LLM-based pairwise sampling directly into BPR training loop
Dual-augmentation strategy combining graph structure (edges) and node attributes (features) simultaneously
Robustification mechanism combining dynamic loss-based edge pruning with MAE-based feature enhancement

Modeling

Base Model: LightGCN (backbone recommender) + LLM (augmentor)

Training Method: Joint optimization of BPR loss (recommendation) and Reconstruction loss (MAE)

Objective Functions:

Purpose: Optimize ranking to ensure positive items score higher than negatives.

Formally: L_BPR = sum(-ln(sigmoid(y_ui+ - y_ui-))) over pruned augmented dataset.
Purpose: Reconstruct masked features to ensure robustness against feature noise.

Formally: L_MAE = sum((f - f_reconstructed)^2) over masked nodes.
Purpose: Total objective.

Formally: L = L_BPR + lambda1 * L_MAE + lambda2 * Regularization

Training Data:

Datasets: Netflix, MovieLens
Augmented data generated via LLM prompts from raw data

Key Hyperparameters:

batch_size: Not reported in the paper
learning_rate: Not reported in the paper
mask_ratio: Not explicitly reported in the paper
+ 1 more
pruning_rate_omega4: Dynamic parameter

Compute: Not reported in the paper

Comparison to Prior Work

vs. MMSSL/MICRO: LLMRec uses generative augmentation via LLMs rather than contrastive views, allowing explicit reasoning about preferences.
vs. LATTICE: LLMRec augments user-item edges directly using LLM knowledge, whereas LATTICE focuses on item-item structure learning.
vs. SimGCL: LLMRec incorporates semantic side information (text) into the augmentation, unlike SimGCL which relies on structural perturbation.
+ 1 more
vs. TALLRec [not cited in paper]: TALLRec fine-tunes the LLM as the recommender itself; LLMRec uses the LLM as a data factory to train a traditional GNN, which is more efficient for inference.

Limitations

Dependence on the quality and cost of the external LLM API for augmentation
Candidate pool for LLM sampling depends on a base recommender, potentially propagating existing biases
Pruning mechanism relies on loss values, which might discard hard-but-informative samples along with noise

Reproducibility

Code: https://github.com/HKUDS/LLMRec

Code and augmented data are publicly available at https://github.com/HKUDS/LLMRec.git. The specific LLM used (e.g., GPT-3.5/4) is not explicitly named in the text provided, but code link allows verification.

📊 Experiments & Results

Evaluation Setup

Top-N recommendation on implicit feedback datasets with side information

Benchmarks:

Netflix (Movie Recommendation)
MovieLens (Movie Recommendation)

Metrics:

Recall@10
NDCG@10
Recall@20
NDCG@20
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Experimental results demonstrate that LLMRec consistently outperforms baseline methods across datasets.
Netflix	Recall@10	0.1695	0.1802	+0.0107
Netflix	NDCG@10	0.1064	0.1165	+0.0101
MovieLens	Recall@10	0.2766	0.2895	+0.0129
MovieLens	NDCG@10	0.1825	0.1983	+0.0158

Experiment Figures

Visual examples of the prompts used for LLM augmentation.

Main Takeaways

LLMRec consistently outperforms state-of-the-art multi-modal methods (MMSSL, LATTICE) and augmentation methods (SimGCL) on both Netflix and MovieLens.
Ablation studies (implied by method description/results discussion) confirm that both edge augmentation and attribute augmentation contribute to performance gains.
The denoising mechanism (noise pruning + MAE) is effective in handling the inherent noise in LLM-generated data.

📚 Prerequisite Knowledge

Prerequisites

Collaborative Filtering (CF) and Graph Neural Networks (GNNs)
Bayesian Personalized Ranking (BPR) loss
Large Language Models (LLMs) for text generation
Masked Autoencoders (MAE)

Key Terms

Implicit feedback: Indirect user behavior (clicks, views) indicating preference, opposed to explicit ratings

BPR: Bayesian Personalized Ranking—a pairwise loss function optimizing the ranking of positive items over negative ones

LightGCN: A simplified Graph Convolutional Network for recommendation that removes non-linearities and feature transformation to focus on neighborhood aggregation

MAE: Masked Autoencoder—a self-supervised learning technique where parts of the input are masked and the model learns to reconstruct them

Side information: Auxiliary data associated with users or items (e.g., reviews, descriptions, categories) used to supplement interaction data

False positive: Recorded interactions that do not reflect genuine user interest (e.g., accidental clicks)

False negative: Items a user would like but hasn't interacted with yet, usually treated as negatives in standard training

MMSSL: Multi-Modal Self-Supervised Learning—a baseline method maximizing mutual information between different modal views