Evaluation Setup
Determine if a target user was used as a few-shot example in the LLM's system prompt
Benchmarks:
- MovieLens-1M (Movie Recommendation)
- Amazon Book (Book Recommendation)
- Amazon Beauty (Product Recommendation)
Metrics:
- Attack Advantage (2 * (Accuracy - 0.5))
- F1 Score
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Attack effectiveness across different datasets and models shows Memorization is consistently the strongest, while Similarity is weak. |
| MovieLens-1M |
Attack Advantage |
0.00 |
0.82 |
+0.82
|
| Amazon Book |
Attack Advantage |
0.00 |
0.78 |
+0.78
|
| General (Peak) |
Attack Advantage |
0.00 |
0.45 |
+0.45
|
| MovieLens-1M |
Memorization Rate |
0 |
0.0003 |
+0.0003
|
Main Takeaways
- Memorization is the most effective signal: LLMs tend to repeat items seen in the prompt when asked for recommendations, creating a clear membership signal
- Similarity attacks fail because semantic embeddings (from LLMs) do not align well with collaborative filtering patterns inherent in RecSys data
- Newer, larger models (GPT-OSS, Llama4) appear more vulnerable to Memorization and Poisoning attacks than older/smaller models
- Instruction-based defenses (telling the model not to reveal examples) reduce success of Memorization/Inquiry attacks but can ironically make models more vulnerable to Poisoning