Membership Inference Attacks on LLM-based Recommender Systems

📝 Paper Summary

Privacy in Large Language Models LLM-based Recommender Systems

Recommendation systems that embed user history into LLM prompts for in-context learning are vulnerable to privacy attacks because models tend to memorize and repeat these specific examples.

Core Problem

In-Context Learning (ICL) for recommendation systems requires embedding private user interaction history directly into system prompts, creating a potential vector for privacy leakage.

Why it matters:

Companies like Amazon and Google are adopting ICL-based RecSys, making privacy risks a production concern
Existing Membership Inference Attacks (MIAs) designed for traditional RecSys (using item embeddings) do not work effectively on LLMs due to embedding mismatches
Current prompt-based defenses may be insufficient, potentially violating privacy laws and eroding user trust

Concrete Example: A system prompt includes a user who watched 'Star Wars'. An attacker queries the model for recommendations for that specific user. If the model blindly repeats 'Star Wars' or recommends highly specific niche sequels not explained by the query alone, the attacker infers the user's data was in the hidden system prompt.

Key Novelty

Prompt-Specific Membership Inference Attacks

Proposes 'Memorization' and 'Inquiry' attacks that exploit the LLM's tendency to repeat or acknowledge content seen in its context window (the prompt)
Introduces a 'Poisoning' attack that modifies a user's history in the query; if the model ignores the modification ('stubbornness'), it indicates the original user data is locked in the system prompt

Architecture

System Architecture for ICL-RecSys

Evaluation Highlights

Memorization Attack achieves >82% attack advantage (normalized accuracy) on MovieLens-1M across all tested LLMs
Inquiry Attack achieves >78% attack advantage on Amazon Book using GPT-OSS:20b and 120b models
Poisoning Attack reaches peak performance of ~45% attack advantage and remains effective even when instruction-based defenses are applied

Breakthrough Assessment

7/10

First comprehensive study of membership inference on ICL-RecSys. The proposed attacks are simple yet surprisingly effective, highlighting a major privacy flaw in current LLM-RecSys designs.

⚙️ Technical Details

Problem Definition

Setting: Black-box Membership Inference on In-Context Learning Recommendation Systems

Inputs: Target user u, user's historical interactions I_u, and access to the LLM's recommendation API

Outputs: Binary decision: Member (user u is in the system prompt) or Non-member

Pipeline Flow

Prompt Composer (embeds user history)
LLM Inference (generates recommendations)
Adversary Query (probes model)

System Modules

Prompt Composer

Constructs the system prompt containing k examples of (user, history, recommendation) triples

Model or implementation: Deterministic formatting

Target LLM

Generates recommendations based on ICL prompts

Model or implementation: Llama3:8b, Llama4:109b, Mistral:7b, GPT-OSS:120b

Attack Module

Executes one of four attack strategies (Similarity, Memorization, Inquiry, Poisoning) to infer membership

Model or implementation: Rule-based or Threshold-based logic

Novel Architectural Elements

Poisoning-based inference logic: using prompt injection of low-similarity items to test model 'stubbornness' as a proxy for membership

Modeling

Base Model: Evaluated on Llama3:8b, Llama4:109b, Gemma3:4b, Mistral:7b, GPT-OSS:20b, GPT-OSS:120b

Training Method: In-Context Learning (Inference-only adaptation)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Item-Diff: Item-Diff relies on interaction-matrix embeddings which fail in LLM semantic space; this paper uses Memorization/Inquiry which exploit LLM text generation properties
vs. Text-only MIA: Previous LLM MIAs require logits/loss; this paper's attacks are strictly black-box (text output only)
vs. Traditional RecSys MIA: Traditional attacks assume training data distribution knowledge for shadow models; this paper targets prompt context leakage

Limitations

Experiments restricted to open-source models; proprietary models (GPT-4, Claude) not tested
Similarity attack performs poorly due to embedding incompatibility
Inquiry attack is brittle against safety-trained models
Limited exploration of factors (only 3 shot settings, 5 positions tested)

📊 Experiments & Results

Evaluation Setup

Determine if a target user was used as a few-shot example in the LLM's system prompt

Benchmarks:

MovieLens-1M (Movie Recommendation)
Amazon Book (Book Recommendation)
Amazon Beauty (Product Recommendation)

Metrics:

Attack Advantage (2 * (Accuracy - 0.5))
F1 Score
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Attack effectiveness across different datasets and models shows Memorization is consistently the strongest, while Similarity is weak.
MovieLens-1M	Attack Advantage	0.00	0.82	+0.82
Amazon Book	Attack Advantage	0.00	0.78	+0.78
General (Peak)	Attack Advantage	0.00	0.45	+0.45
MovieLens-1M	Memorization Rate	0	0.0003	+0.0003

Experiment Figures

Illustration of the Similarity Attack workflow

Best attack advantages across different attack types on Llama4, Mistral and GPT-OSS:120b for three datasets

Main Takeaways

Memorization is the most effective signal: LLMs tend to repeat items seen in the prompt when asked for recommendations, creating a clear membership signal
Similarity attacks fail because semantic embeddings (from LLMs) do not align well with collaborative filtering patterns inherent in RecSys data
Newer, larger models (GPT-OSS, Llama4) appear more vulnerable to Memorization and Poisoning attacks than older/smaller models
Instruction-based defenses (telling the model not to reveal examples) reduce success of Memorization/Inquiry attacks but can ironically make models more vulnerable to Poisoning

📚 Prerequisite Knowledge

Prerequisites

Basics of In-Context Learning (ICL) and few-shot prompting
Membership Inference Attacks (MIA)
Recommender Systems (collaborative filtering, embeddings)

Key Terms

MIA: Membership Inference Attack—an attempt to determine if a specific data record was used to train or prompt a machine learning model

ICL: In-Context Learning—adapting an LLM to a task (like recommendation) by providing examples in the prompt without updating model weights

Attack Advantage: A metric for MIA performance defined as 2 * (Accuracy - 0.5), where 0 is random guessing and 1 is perfect inference

Shadow Model: A model trained by an attacker to mimic the target model's behavior, often used in traditional MIAs to estimate confidence scores

Poisoning Attack: In this context, an MIA technique where the attacker provides modified/fake history to see if the model creates recommendations based on the fake history (non-member) or stubbornly sticks to the real history in the prompt (member)

System Prompt: The hidden initial instructions and examples given to an LLM to define its behavior before user interaction