Stealthy Attack on Large Language Model based Recommendation

📝 Paper Summary

Adversarial Attacks on Recommender Systems LLM-based Recommendation Security

Attackers can significantly boost a target item's exposure in LLM-based recommender systems by imperceptibly altering its textual content (titles) during testing, without needing to influence model training.

Core Problem

LLM-based recommender systems heavily rely on textual content, creating a new vulnerability where slight text modifications can manipulate rankings.

Why it matters:

Malicious actors can unfairly promote low-quality products or misinformation without detection.
Traditional shilling attacks (injecting fake user interactions) are less effective against LLM-based models and are easier to detect due to performance degradation.
Current security research overlooks the specific vulnerabilities introduced by the semantic sensitivity of LLMs in recommendation contexts.

Concrete Example: An attacker wants to promote a specific item. By using a black-box attack tool (like TextFooler) to subtly change the item's title—swapping synonyms or inserting invisible characters—the LLM recommender (e.g., RecFormer) suddenly ranks it in the top-50 for many users, whereas the original title was ignored.

Key Novelty

Test-Phase Textual Adversarial Attack on RS

Exploits the semantic sensitivity of LLMs: unlike ID-based models, LLM-based recommenders react strongly to textual phrasing.
Requires zero training data poisoning: the attack happens entirely at inference time by modifying the item's metadata (title).
Achieves high stealthiness: the modified text remains human-readable and relevant, and the system's overall accuracy metrics (Recall, NDCG) do not drop, masking the attack.

Architecture

The iterative Black-Box Text Attack procedure applied to recommendation items.

Evaluation Highlights

On RecFormer (Beauty dataset), the 'BertAttack' method increases target item exposure rate from ~0.2% to ~20%, a 100x increase.
Simple GPT-based rewriting of titles increases purchasing propensity on the P5 model from ~0.4 to ~0.7 on the Toys dataset.
Attacks maintain high stealthiness: Overall Recall@50 on RecFormer drops negligibly (from 0.0381 to 0.0379) even when 10% of items are attacked.

Breakthrough Assessment

8/10

First work to systematically demonstrate the vulnerability of LLM-based recommenders to textual attacks. The results are striking (huge exposure gains) and the stealthiness aspect is critical for real-world security.

⚙️ Technical Details

Problem Definition

Setting: Top-K Recommendation and Purchase Prediction

Inputs: User history text t_u and candidate item text t_i (plus optional IDs)

Outputs: Predicted preference score y_ui (probability or ranking score)

Pipeline Flow

Item Selection (Target Items selected)
Text Perturbation (Title modified via Attack Method)
Prompt Construction (User history + Modified Item Title)
Victim Model Inference (LLM predicts score)
Attack Optimization (Iterative query to maximize score)

System Modules

Attack Generator

Generate perturbed titles to maximize recommendation probability

Model or implementation: Various (DeepWordBug, TextFooler, BertAttack, GPT-3.5)

Victim Recommender

Predict user preference score for the item

Model or implementation: RecFormer / P5 / TALLRec / CoLLM

Novel Architectural Elements

Application of NLP adversarial attack frameworks (Goal Function -> Constraints -> Transformation -> Search) directly to the ranking/scoring output of LLM-based recommenders.

Modeling

Base Model: Evaluated on RecFormer, P5, TALLRec, CoLLM (based on various backbones like Longformer, T5, LLaMA)

Training Method: Adversarial attack at inference time (no model training involved for the attacker)

Objective Functions:

Purpose: Maximize the predicted score/rank of the target item.

Formally: argmax_{t_i'} f_theta(P_{u,i'}) s.t. Constraints(t_i, t_i')

Compute: Dependent on the number of queries required by the black-box attack method (ranging from ~50 to ~400 queries per item).

Comparison to Prior Work

vs. Shilling Attacks: Does not require training data injection; attacks inference phase directly.
vs. Shilling Attacks: Maintains overall system performance (stealthy), whereas shilling often degrades it.
vs. Shilling Attacks: Significantly more effective on LLM-based models which rely on text semantic understanding.

Limitations

Attacks require querying the victim model multiple times (high query cost).
Fine-tuned models are more resilient than zero-shot models, though still vulnerable.
Defense mechanisms like simple rewriting can partially mitigate the attack, but not fully.
The study focuses on title modification; description modification is not explored.

Reproducibility

Code: https://github.com/CRIPAC-DIG/RecTextAttack

Code is publicly available at https://github.com/CRIPAC-DIG/RecTextAttack. The paper uses standard datasets (Amazon 5-core) and publicly available victim model codebases.

📊 Experiments & Results

Evaluation Setup

Targeted attack on top-N recommendation (RecFormer) and rating prediction (others).

Benchmarks:

Amazon Beauty (Sequential Recommendation)
Amazon Toys and Games (Sequential Recommendation)
Amazon Sports and Outdoors (Sequential Recommendation)

Metrics:

Exposure Rate (for ranking models)
Purchasing Propensity (for rating models)
Stealthiness (Recall@K, NDCG@K of system)
Text Quality (Semantic Similarity, Perplexity)
Number of Queries
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Attack Effectiveness: Comparison of exposure rates on RecFormer (Beauty dataset) showing massive gains for text attacks over shilling baselines.
Amazon Beauty	Exposure Rate	0.0024	0.2038	+0.2014
Amazon Beauty	Exposure Rate	0.0024	0.0264	+0.0240
Stealthiness: Impact on overall recommendation performance (RecFormer, Beauty) remains minimal.
Amazon Beauty	Recall@50	0.0381	0.0379	-0.0002
Comparison across victim models: Purchasing Propensity on P5 (Toys dataset).
Amazon Toys	Purchasing Propensity	0.3804	0.7634	+0.3830

Experiment Figures

Scatter plot of Exposure Rate vs. Recall@50 for RecFormer on Beauty dataset.

Comparison of attack effectiveness on Zero-shot vs. Fine-tuned RecFormer.

Main Takeaways

Text-based attacks are highly effective against LLM-based recommenders, far outperforming traditional shilling attacks which rely on ID/interaction data.
The attack is stealthy: it does not degrade overall system metrics (Recall/NDCG) and generates semantically coherent titles.
Zero-shot LLM recommenders are more vulnerable than fine-tuned ones, though both are susceptible.
Popular items are slightly harder to attack than unpopular ones, but the vulnerability exists across the spectrum.
Simple GPT-based rewriting is a viable low-cost attack, though sophisticated gradient/search-based attacks (BertAttack) are more potent.

📚 Prerequisite Knowledge

Prerequisites

Basics of Recommender Systems (collaborative filtering, content-based)
Large Language Models (LLMs) for recommendation
Adversarial Text Attacks (synonym substitution, character perturbation)

Key Terms

LLM-based RS: Recommender systems that use Large Language Models to encode user history and item text into prompts for prediction.

Shilling Attack: Traditional attack where fake user profiles are injected into the training data to manipulate item ratings.

Exposure Rate: The percentage of users for whom the target item appears in the top-K recommendation list.

Purchasing Propensity: The predicted probability that a user will interact with a specific item.

Perplexity: A metric measuring how natural or fluent a piece of text is; lower values indicate more natural text.

ROUGE: A set of metrics used to evaluate automatic summarization and machine translation by comparing generated text to reference text.

RecFormer: A transformer-based recommender model that learns user and item representations from text.

P5: Pre-training, Personalized Prompt, Prediction Paradigm—a unified text-to-text framework for recommendation.

TALLRec: A framework that tunes Large Language Models for Recommendation via instruction tuning.

CoLLM: Collaborative Large Language Model—a model integrating collaborative signals into LLMs for recommendation.