Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective

📝 Paper Summary

Privacy in Recommender Systems LLM Inversion Attacks Adversarial Machine Learning

This paper demonstrates that sensitive user interaction histories and demographics in LLM-powered recommender systems can be reconstructed from output logits using an inversion attack enhanced by similarity-guided refinement.

Core Problem

LLM-empowered recommender systems (RecSys) construct highly sensitive prompts containing user history and profiles, but exposing the model's output logits allows adversaries to reconstruct this private information.

Why it matters:

LLM-RecSys explicitly incorporate detailed personal data (history, age, gender) into prompts, making leaks far more severe than in traditional ID-based systems
Standard inversion attacks focus on general text generation; RecSys tasks often yield sparse text (e.g., 'Yes/No') but rich logits, requiring specialized attack methods
Malicious users or eavesdroppers can uncover proprietary system instructions and private user data simply by analyzing the probability distributions of the system's responses

Concrete Example: A user prompt might be 'The user is a 25-year-old male who liked Matrix and Inception...'. The RecSys outputs logits for a binary prediction (Like/Dislike). An attacker captures these logits and uses the proposed inversion model to regenerate the original text: 'User: 25 male, History: Matrix, Inception'.

Key Novelty

Similarity-Guided Refinement for RecSys Inversion

Adapts the vec2text framework to the recommendation domain by training on a novel synthetic dataset of diverse RecSys prompt templates
Introduces an iterative refinement loop: the attacker generates candidate prompts, feeds them back into the victim model to get logits, and selects the candidate whose logits are most similar (cosine similarity) to the original observed logits

Architecture

The proposed inversion framework pipeline

Evaluation Highlights

Reconstructs nearly 65% of user-interacted item titles from the output logits of LLM-based recommenders
Correctly infers sensitive demographic attributes (age and gender) in 87% of cases
Proposed similarity-guided refinement strategy yields an additional 5–13% improvement in reconstruction fidelity over the base inversion model

Breakthrough Assessment

8/10

First systematic study exposing privacy risks in LLM-RecSys via inversion. The results (87% demographic inference) are alarmingly high and highlight a critical vulnerability in a growing field.

⚙️ Technical Details

Problem Definition

Setting: Black-box inversion attack on LLM-empowered Recommender Systems

Inputs: Victim model LLM_theta and target output logits Y_hat corresponding to an unknown private prompt P

Outputs: Reconstructed prompt P_hat approximating the original private input P (containing history V_u and profile u_pro)

Pipeline Flow

Projection Network (Logits -> Fixed Embeddings)
Inversion Model (Embeddings -> Candidate Prompts)
Similarity-Guided Refinement (Iterative Selection)

System Modules

Projection Network

Map variable-length logits from the victim model to fixed-size embeddings compatible with the inversion model

Model or implementation: Linear projection layer

Inversion Model

Generate text candidates from the projected embeddings

Model or implementation: Pretrained Encoder-Decoder (vec2text/T5-based)

Victim Model (Oracle) (Refinement)

Generate logits for candidate prompts to allow comparison with the target

Model or implementation: Target LLM-RecSys (e.g., TallRec or CoLLM)

Similarity Comparator (Refinement)

Select the candidate prompt whose logits best match the target logits

Model or implementation: Cosine Similarity Function

Novel Architectural Elements

Similarity-Guided Refinement loop: Iteratively using the victim model as an oracle to validate and select inversion candidates based on logit similarity
RecSys-specific synthetic data pipeline: Generates diverse prompt templates (binary, sequential, direct rec) to train the inversion model

Modeling

Base Model: T5-like architecture (vec2text backbone) for the Inversion Model

Training Method: Supervised training on synthetic RecSys datasets

Objective Functions:

Purpose: Minimize the difference between generated text and ground truth prompts.

Formally: Standard language modeling loss (Cross-Entropy).

Adaptation: Fine-tuned on domain-specific prompts

Trainable Parameters: Full inversion model parameters

Training Data:

Synthetic dataset generated from ML-25M/ML-32M (movies) and other Amazon Books data (books)
Templates cover 5 tasks: binary classification, direct rec, sequential, rating prediction, cold start
Includes demographic injection (age, gender) and interaction history

Key Hyperparameters:

beam_width_k: 5 (for refinement search)
epsilon_threshold: 1e-5 (convergence criteria)

Compute: Not reported in the paper

Comparison to Prior Work

vs. vec2text: Adds a similarity-guided refinement stage specifically for the black-box RecSys setting where only logits are available
vs. CheatAgent: Focuses on privacy/inversion (reconstructing input) rather than adversarial manipulation of recommendations
vs. General Inversion Attacks: Specifically targets RecSys prompts which are often complex/structured but yield sparse outputs (e.g., binary labels), unlike general text-to-text tasks

Limitations

Requires access to the full output logits (next-token probability distributions), which may not be available in all commercial APIs
Attack effectiveness depends on domain consistency; transferring across very different domains (e.g., Books -> Movies) is likely less effective
Refinement step requires multiple queries to the victim model (beam search k=5), increasing computational cost and potential detection risk

Reproducibility

Code: https://github.com/xuemingxxx/Attack_RecSys/

Code is publicly available at https://github.com/xuemingxxx/Attack_RecSys/. The paper details the synthetic dataset construction algorithm (Algorithm 1) using public datasets (MovieLens, Amazon Books). Victim models (TallRec, CoLLM) are based on open-source LLMs (Llama-7B, Qwen-7B).

📊 Experiments & Results

Evaluation Setup

Inversion attack on two distinct recommendation domains (Movies, Books) using two different victim architectures

Benchmarks:

Movie Scenario (Prompt Reconstruction) [New]
Book Scenario (Prompt Reconstruction) [New]

Metrics:

Item Recovery Rate (percentage of user-interacted items recovered)
Attribute Inference Accuracy (percentage of age/gender correctly inferred)
ROUGE scores (textual similarity, implied by context)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Combined Average (Movies & Books)	Item Recovery Rate	Not reported in the paper	0.65	Not reported in the paper
Combined Average (Movies & Books)	Attribute Inference Accuracy	Not reported in the paper	0.87	Not reported in the paper
Combined Average	Reconstruction Fidelity (unspecified metric, likely ROUGE or Recovery Rate)	Not reported in the paper	Not reported in the paper	+0.05 to +0.13

Experiment Figures

The threat model diagram showing where the attack occurs in the RecSys workflow

Main Takeaways

LLM-empowered RecSys are highly vulnerable to inversion attacks, with attackers able to reconstruct a majority of user interaction history (65%) and demographics (87%).
The similarity-guided refinement strategy significantly boosts attack performance (5-13%) by using the victim model's own logits to verify candidate reconstructions.
Privacy leakage is largely insensitive to the recommendation performance of the victim model itself, meaning even poor recommenders can leak high-fidelity user data.
The attack is effective across different domains (Movies, Books) and architectures (TallRec/Llama, CoLLM/Qwen) when trained on appropriate synthetic data.

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM-based Recommender Systems (e.g., TallRec)
Knowledge of Model Inversion Attacks (reconstructing inputs from outputs)
Familiarity with Logits and Token Probability Distributions

Key Terms

Logits: The raw, unnormalized prediction scores generated by the last layer of a neural network before applying softmax

Inversion Attack: An adversarial technique where an attacker attempts to reconstruct the private input data (e.g., user history) used to generate a model's output

vec2text: A framework for inverting text embeddings (or logits) back into the original text using a T5-based encoder-decoder trained on reconstruction tasks

TallRec: An LLM-based recommendation model that fine-tunes LLaMA using LoRA for binary classification tasks (predicting if a user likes an item)

CoLLM: A collaborative LLM-based recommendation model that integrates collaborative filtering embeddings into the LLM's token space

Beam Search: A search algorithm that explores a graph by expanding the most promising node in a limited set

Cosine Similarity: A metric used to measure how similar the logits of a reconstructed prompt are to the target logits