Recommendation with Generative Models

📝 Paper Summary

Generative AI in Recommender Systems Foundations of Recommender Systems

The authors propose a paradigm shift from discriminative filtering to Generative Recommender Systems (Gen-RecSys), utilizing models that learn data distributions to generate structured, textual, and multimodal outputs rather than merely ranking existing items.

Core Problem

Traditional discriminative recommender systems focus on ranking fixed catalogs ($P(Y|X)$), limiting their ability to handle cold-start scenarios, generate explanations, or create complex structured outputs like bundles and creative content.

Why it matters:

Standard systems struggle with data sparsity and cold-start problems where interaction history is minimal.
Discriminative models prioritize accuracy over transparency, failing to provide natural language explanations or reasoning for recommendations.
Non-generative systems cannot create new content (e.g., personalized fashion designs or text) or support complex, multi-turn conversational interactions.

Concrete Example: A traditional system can suggest a movie rating, but cannot generate a personalized review explaining *why* a user would like it. In contrast, a generative system (like the example in Figure 2.1) can synthesize a unique cocktail recipe ('Pomberrytini') or write a beer review ('Pours a very dark brown...') based on learned user preferences.

Key Novelty

Generative Recommender Systems (Gen-RecSys) Framework

Redefines recommendation from a discriminative task (predicting a label given an item) to a generative task (estimating the probability distribution of items/data given a user/label).
Classifies systems by output capability: Structured Outputs (bundles/sequences), Text Generation (explanations/dialogue), and Multimedia Generation (images/audio), utilizing models like VAEs and LLMs.
Distinguishes between 'Directly trained models' (learned from scratch on interaction data) and 'Pretrained Generative Models' (adapting Foundation Models like GPT-4 or CLIP via fine-tuning or prompting).

Breakthrough Assessment

8/10

This work establishes a comprehensive taxonomy and foundational theory for the emerging field of Gen-RecSys, unifying diverse approaches (VAEs, LLMs, Diffusion) under a single framework, though it is a survey/monograph rather than a single empirical study.

⚙️ Technical Details

Problem Definition

Setting: Transition from Discriminative Modeling to Generative Modeling in Recommendation

Inputs: User $u$, Item $i$, Context (text, image, history)

Outputs: Samples from the learned distribution $P(X|Y=y)$, which can be items, text explanations, or new media content.

Pipeline Flow

Data Representation (Interactions/Modalities)
Generative Modeling (Distribution Learning)
Output Generation (Sampling/Synthesis)

System Modules

Data Representation

Represent users and items as matrices, sets, graphs, or sequences to capture interaction history and modalities.

Model or implementation: Various (Matrix Factorization, Graph Embeddings, Token Sequences)

Generative Model Core

Learn the underlying probability distribution of the data $P(X|Y)$ (e.g., items given a user profile).

Model or implementation: VAE, GAN, Diffusion, or LLM

Sampler / Generator

Sample from the learned distribution to produce final recommendations or content.

Model or implementation: Sampling Algorithm

Novel Architectural Elements

Unification of ID-based probabilistic models (VAEs) and Content-based Foundation Models (LLMs) under a single Generative Recommendation definition.
Framework for 'Generative Retrieval' where items are represented by semantic IDs and predicted autoregressively.

Modeling

Base Model: Varies by application (e.g., VAE-CF for collaborative filtering, GPT-4/Llama for text, CLIP for multimodal)

Training Method: Varies (VAE training, Fine-tuning, Prompt Engineering)

Objective Functions:

Purpose: Maximize the likelihood of observed data under the generative distribution.

Formally: VAEs optimize the Evidence Lower Bound (ELBO).
Purpose: Align generated text with user preferences.

Formally: LLMs use Next Token Prediction or Reinforcement Learning from Human Feedback (RLHF).

Training Data:

User-item interaction matrices (explicit ratings or implicit feedback)
Multimodal datasets (text, images, videos)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Matrix Factorization: Gen-RecSys (like VAE-CF) captures complex probability distributions and handles sparsity better than fixed vector dot products.
vs. Discriminative NN (NeuMF): Gen-RecSys can generate new data samples and explanations, whereas NeuMF is limited to score prediction.

Limitations

Generative models, especially LLMs, can hallucinate or generate non-factual explanations.
High computational cost for inference compared to traditional dot-product retrieval.
Potential for societal harms including bias amplification, filter bubbles, and manipulation (discussed in Chapter 7).

Reproducibility

No replication artifacts mentioned in the paper (this is a theoretical survey/monograph chapter). It references external works (e.g., VAE-CF, TIGER, GPT-3) which have their own repositories.

📊 Experiments & Results

Evaluation Setup

Survey of evaluation methodologies (Chapter 6) rather than a specific experiment.

Benchmarks:

Standard RecSys Datasets (Collaborative Filtering / Top-k Recommendation)

Metrics:

Accuracy (Precision, Recall, NDCG)
Diversity
Novelty
Fairness
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Generative models (like VAEs) generally outperform non-generative baselines (like MF/NeuMF) in top-k recommendation quality, particularly in sparse data scenarios.
LLMs enable zero-shot and few-shot recommendation capabilities (Cold Start) via in-context learning, a capability absent in traditional ID-based models.
Generative frameworks allow for 'Whole-page generation' and 'Bundle recommendation', creating coherent sets of items rather than isolated rankings.
Evaluation of Gen-RecSys requires new metrics beyond accuracy, including assessments of text quality, explanation faithfulness, and generative diversity.

📚 Prerequisite Knowledge

Prerequisites

Collaborative Filtering (Matrix Factorization)
Deep Generative Models (VAE, GAN, Diffusion)
Large Language Models (Transformers)
Bayesian Probability (Priors, Posteriors)

Key Terms

Gen-RecSys: Recommender Systems with Generative Models—systems integrating generative AI to enhance prediction or generate complex outputs like text or images.

VAE: Variational Autoencoder—a deep generative model that learns probabilistic distributions of data in a latent space, often used for collaborative filtering.

LLM: Large Language Model—transformer-based models like GPT-4 capable of generating coherent text and performing zero-shot learning.

Cold Start: The challenge of recommending items to new users or suggesting new items with no prior interaction history.

RAG: Retrieval-Augmented Generation—combining information retrieval with generative models to provide contextually relevant and accurate outputs.

Discriminative Model: A model that estimates the probability of a label given an observation ($P(Y|X)$), such as predicting a rating given a user-item pair.

Generative Model: A model that estimates the distribution of data given a label ($P(X|Y)$), allowing the generation of new data samples.

Zero-shot Learning: The ability of a model (usually an LLM) to perform a task without explicit training examples, often using in-context learning.

In-context Learning: A capability of LLMs to understand tasks from prompts and examples provided at inference time without parameter updates.