Representation Learning with Large Language Models for Recommendation

📝 Paper Summary

LLM-enhanced Recommendation Representation Learning

RLMRec integrates LLMs with recommender systems by aligning the semantic space of LLM-generated profiles with ID-based collaborative signals via mutual information maximization, avoiding slow runtime inference.

Core Problem

Graph-based recommenders miss textual semantics by relying on IDs and noisy implicit feedback, while direct LLM usage is computationally expensive and prone to hallucination.

Why it matters:

Pure ID-based methods overlook valuable textual data, reducing representation quality.
Direct LLM inference (e.g., TALLRec) is not scalable for real-time systems with large user bases due to high latency.
Implicit feedback (clicks) contains noise like false negatives or popularity bias, which degrades model performance.

Concrete Example: Using LLaMA2-13B for TALLRec takes ~3.6 seconds per user recommendations. Furthermore, a preliminary study shows ChatGPT-refined recommendations perform worse than LightGCN due to hallucinating non-existent items.

Key Novelty

Model-Agnostic Representation Learning Framework (RLMRec)

Uses LLMs offline to generate denoised semantic profiles for users and items from raw text.
Theoretical alignment of Collaborative Filtering (CF) signals and LLM semantic signals using Mutual Information Maximization.
Aligns representations via Contrastive Learning (pulling pairs together) or Generative Modeling (reconstructing masked semantic vectors).

Architecture

The RLMRec framework illustrating the alignment between the Collaborative Filtering view (ID-based) and the LLM view (Text-based).

Evaluation Highlights

Preliminary study: ChatGPT re-ranking performs worse than LightGCN baseline due to hallucinations (suggesting non-candidate items).
Demonstrates theoretical proof that maximizing mutual information between CF and LLM representations minimizes the impact of noise.
Integrates with state-of-the-art models (LightGCN, NGCF) effectively (Quantitative performance metrics not reported in the provided text snippet).

Breakthrough Assessment

7/10

Addresses critical scalability issues of LLM-based recommendation by moving LLM usage to representation alignment rather than inference, backed by theoretical mutual information grounding.

⚙️ Technical Details

Problem Definition

Setting: Top-K Recommendation with Implicit Feedback

Inputs: User set U, Item set V, Interaction history X, Auxiliary textual information

Outputs: Learned user/item representations e_u, e_v for ranking

Pipeline Flow

Data Preprocessing: LLM generates User/Item Profiles from raw text
Encoding: Text Embedding Model converts profiles to semantic vectors
Representation Learning: GNN/CF Model learns ID-based embeddings
Alignment: Cross-view Mutual Information Maximization (Contrastive or Generative)

System Modules

Profile Generator

Generate denoised textual profiles for users and items

Model or implementation: Large Language Models (e.g., GPT-3.5/LLaMA - implicit)

Semantic Encoder

Encode text profiles into fixed-length semantic vectors

Model or implementation: Text Embedding Model T(.)

Recommender Backbone

Learn collaborative ID-based representations

Model or implementation: Any CF model (e.g., LightGCN, NGCF)

Alignment Module

Align semantic (s) and relational (e) spaces

Model or implementation: MLP (sigma) + Contrastive Loss OR Generative Reconstruction

Novel Architectural Elements

Cross-view alignment framework using Mutual Information Maximization to bridge ID-based CF and Text-based LLM spaces.
Generative alignment (RLMRec-Gen) using masked reconstruction of semantic vectors from CF vectors.

Modeling

Base Model: Model-agnostic (works with LightGCN, NGCF, etc.)

Training Method: Multi-task learning (Rec Loss + Alignment Loss)

Objective Functions:

Purpose: Maximize Mutual Information between CF and Semantic representations.

Formally: Maximizing lower bound of I(e; s) via contrastive or generative density ratio modeling.
Purpose: Standard Recommendation Optimization.

Formally: L_total = L_Recommender + L_Alignment

Compute: Not reported in the paper

Comparison to Prior Work

vs. TALLRec: RLMRec avoids expensive runtime LLM inference by using LLMs only for offline representation alignment.
vs. LightGCN: RLMRec incorporates semantic textual signals via alignment, whereas LightGCN relies solely on ID/interaction graph.

Limitations

Depends on the quality of available textual data for profile generation.
Profile generation with LLMs is a one-time but potentially costly offline process.
Specific quantitative performance gains (Recall/NDCG) not included in the provided text snippet.

Reproducibility

Code: https://github.com/HKUDS/RLMRec

Code publicly available at https://github.com/HKUDS/RLMRec. Uses Amazon, Yelp, and Steam datasets. Prompt templates for profile generation provided in Appendix.

📊 Experiments & Results

Evaluation Setup

Top-K Recommendation

Benchmarks:

Amazon-Book (Recommendation)
Yelp (Recommendation)
Steam (Recommendation)

Metrics:

Not reported in the provided text
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Inference Latency	Seconds per user	Not reported in the paper	3.6	Not applicable

Experiment Figures

Comparison of LightGCN vs. LightGCN + ChatGPT Re-ranking on Amazon dataset.

Main Takeaways

Directly using LLMs (like ChatGPT) for re-ranking can degrade performance due to hallucinations and lack of candidate awareness (Preliminary Study).
LLM inference for recommendation (e.g., TALLRec) faces severe scalability challenges (3.6s/user) compared to traditional methods.
Theoretical derivation shows that incorporating textual signals via mutual information maximization improves representation quality by mitigating noise.

📚 Prerequisite Knowledge

Prerequisites

Collaborative Filtering (CF)
Graph Neural Networks (GNN)
Mutual Information Maximization
Contrastive Learning

Key Terms

Collaborative Filtering (CF): A method of making automatic predictions about the interests of a user by collecting preferences from many users.

Mutual Information Maximization: An optimization objective that increases the dependency between two random variables (here, CF and LLM representations).

Implicit Feedback: Indirect user behavior data (like clicks or views) rather than explicit ratings.

Hallucination: A phenomenon where LLMs generate plausible but factually incorrect or non-existent content (e.g., recommending fake items).

Masked Autoencoder (MAE): A self-supervised learning technique where parts of the input are hidden and the model tries to reconstruct them.

TALLRec: A baseline method that fine-tunes LLMs using instruction tuning for recommendation tasks.

LightGCN: A simplified Graph Convolutional Network for recommendation that linearly propagates user/item embeddings.

Contrastive Learning: A learning paradigm that pulls similar (positive) data pairs close and pushes dissimilar (negative) pairs apart in embedding space.