Large Language Model Can Interpret Latent Space of Sequential Recommender

📝 Paper Summary

LLM for Recommendation Multimodal alignment

RecInterpreter aligns frozen sequential recommenders with LLaMA using a lightweight linear adapter, enabling the LLM to understand and textually describe hidden user behavior representations.

Core Problem

Current LLM-for-Rec approaches rely on text prompts of interaction history, shielding the LLM from accessing the rich, compressed hidden representations learned by ID-based sequential recommenders.

Why it matters:

ID-based recommenders encode powerful sequential patterns that text prompts might miss or represent inefficiently.
Bridging this gap allows LLMs to reason over the internal state of existing high-performance recommender models.
Generative recommenders (like DreamRec) produce latent vectors for 'oracle' items that lack explicit IDs; current methods cannot easily translate these vectors into readable item names.

Concrete Example: A generative recommender like DreamRec might output a latent vector representing the 'ideal next item'. Without a bridge, we can only find the nearest existing item ID, potentially missing the nuance. RecInterpreter allows LLaMA to take that vector and describe exactly what movie it represents, even if it's not in the candidate set.

Key Novelty

RecInterpreter: Interpreting Latent Space via Alignment

Treat recommender hidden states like a modality (similar to images in Flamingo/MiniGPT-4) and project them into the LLM's token space via a linear adapter.
Use a 'sequence-residual' prompt strategy: show the LLM the state *before* and *after* an interaction and ask it to describe the difference (the residual item).

Architecture

The RecInterpreter framework connecting a frozen SeqRec encoder to a frozen LLM via a trainable linear layer.

Evaluation Highlights

LLaMA achieves 97.89% accuracy on MovieLens in identifying the residual item from DreamRec's hidden representations using the sequence-residual prompt.
In sequence recovery (reconstructing history from one vector), LLaMA recovers >5 correct items for over 35% of test samples on MovieLens when aligned with Caser or SASRec.
Instantiated oracle items from DreamRec (via RecInterpreter) are preferred by ChatGPT over SASRec's top-1 recommendations in 50.53% of cases on MovieLens.

Breakthrough Assessment

7/10

Novel perspective on treating recommender embeddings as a 'modality' for LLMs. Strong empirical results on interpretability, though limited to small datasets.

⚙️ Technical Details

Problem Definition

Setting: Aligning the latent space of a pre-trained sequential recommender with the token space of an LLM to generate textual descriptions of items or sequences.

Inputs: A sequence of item IDs s transformed into a hidden representation h_s by a frozen recommender.

Outputs: Textual description generated by the LLM (e.g., list of movie titles or a specific target item).

Pipeline Flow

Sequence Encoding (Frozen Recommender) → Hidden Representation
Representation Adaptation (Linear Layer) → Soft Tokens
Prompt Construction (Interleaving Text + Soft Tokens) → LLaMA
Text Generation (Description of items)

System Modules

Sequential Recommender Encoder

Encode item ID sequence into a fixed-size hidden vector.

Model or implementation: Varied (SASRec, GRU4Rec, Caser, DreamRec)

Lightweight Adapter

Project recommender hidden vector into LLM token embedding space.

Model or implementation: Linear Projection Layer

LLM

Generate text based on prompts containing projected embeddings.

Model or implementation: LLaMA-7B

Novel Architectural Elements

Application of multimodal alignment (via linear adapter) specifically to the latent space of ID-based sequential recommenders.
Sequence-Residual Prompting structure: <Text> <State_t> <Text> <State_t+1> <Question>.

Modeling

Base Model: LLaMA-7B

Training Method: Supervised alignment training (training only the linear adapter)

Objective Functions:

Purpose: Minimize the difference between generated text and ground truth item descriptions.

Formally: Standard autoregressive language modeling loss conditioned on the projected hidden states.

Adaptation: Linear projection layer (recommender dim -> 4096)

Trainable Parameters: Only the linear projection layer

Training Data:

Pairs of (Hidden Representation, Textual Description of Items)
MovieLens-100K and subsampled Steam dataset

Key Hyperparameters:

learning_rate: 0.0005 (max after warmup)
batch_size: 256
epochs: 20
+ 2 more
weight_decay: Searched in [1e-4, 1e-5, 1e-6]
warmup_epochs: 5

Compute: Single Nvidia GeForce A40. Training takes ~2 hours/epoch (MovieLens) and ~6 hours/epoch (Steam).

Comparison to Prior Work

vs. LLM4Rec: RecInterpreter feeds *embeddings* (latent representations) to LLM, not text history.
vs. MiniGPT-4: Adapts the concept to Recommender Systems instead of Computer Vision.
vs. TALLRec [not cited in paper]: TALLRec fine-tunes the LLM itself on rec tasks; RecInterpreter keeps LLM frozen and tunes an adapter for interpretation.

Limitations

Linear projection might be too simple; advanced adapters (Q-Former) not explored.
Datasets used are relatively small (MovieLens-100K).
Recovering full sequence from one vector is inherently lossy/difficult.
Requires access to recommender internal states (white-box).

Reproducibility

Code: https://github.com/YangZhengyi98/RecInterpreter

Code publicly available. Datasets are public (MovieLens, Steam). Recommender backbones are standard.

📊 Experiments & Results

Evaluation Setup

Interpretability tasks: Sequence Recovery (list all items) and Sequence Residual (identify added item).

Benchmarks:

MovieLens-100K (Movie Recommendation / Interpretation)
Steam (Game Recommendation / Interpretation)

Metrics:

Recovery Count (number of correctly recovered items)
Residual Identification Accuracy
ChatGPT-based preference (for oracle instantiation)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Sequence Residual Task: Accuracy of LLaMA identifying the 'residual' (newly added) item by comparing hidden states t and t+1.
MovieLens	Accuracy	52.63	52.63	0.00
MovieLens	Accuracy	Not applicable	97.89	Not applicable
Steam	Accuracy	Not applicable	86.33	Not applicable
Oracle Instantiation: Using RecInterpreter to 'decode' DreamRec's ideal latent vector into text, then comparing preference vs SASRec via ChatGPT.
MovieLens	ChatGPT Preference	35.79	50.53	+14.74

Experiment Figures

Distribution of the number of correctly recovered items (0 to 8+) for Sequence-Recovery task across datasets and backbones.

Main Takeaways

LLMs can indeed interpret the latent space of ID-based recommenders when aligned via a simple linear adapter.
The 'Sequence-Residual' prompt (comparing state t vs t+1) is much more effective than 'Sequence-Recovery' (decoding state t alone).
DreamRec's latent space (diffusion-based) appears more 'interpretable' or contains clearer signal for the LLM than RNN/CNN baselines.
This framework allows 'instantiating' abstract vectors from generative recommenders into concrete items, even those outside the training set.

📚 Prerequisite Knowledge

Prerequisites

Sequential Recommendation (SASRec, GRU4Rec)
Large Language Models (LLaMA)
Multimodal Alignment (Linear Projection/Adapters)

Key Terms

RecInterpreter: The proposed framework that uses a linear adapter to map recommender hidden states to LLM token embeddings.

Sequence-Recovery Prompt: A task where the LLM must list all items in a user's history given only the projected hidden representation of the sequence.

Sequence-Residual Prompt: A task where the LLM is given representations of a sequence before and after an interaction and must describe the added item.

DreamRec: A generative sequential recommender based on diffusion models that generates a latent vector for the ideal next item.

ID-based recommender: Recommender systems that represent items as discrete IDs and learn embeddings, as opposed to content-based systems.

Adapter: A lightweight trainable module (here, a linear layer) that projects features from one model's space to another's.