Exploring Backdoor Attack and Defense for LLM-empowered Recommendations

📝 Paper Summary

LLM Security Recommender Systems (RecSys) Backdoor Attacks

The paper reveals that Large Language Model-based recommender systems can be manipulated to recommend specific items by poisoning a small fraction of training data with trigger-embedded titles, effectively creating a backdoor.

Core Problem

LLM-based recommender systems are highly vulnerable to backdoor attacks where adversaries inject triggers into item metadata (titles) to manipulate recommendation outcomes.

Why it matters:

Item producers (retailers, authors) have a financial incentive to manipulate systems to increase the exposure of their products.
Small companies using open-source LLMs or third-party training platforms are susceptible to data poisoning attacks.
Existing research has not fully explored the safety of LLM-based RecSys against textual backdoor attacks.

Concrete Example: An attacker creates a trigger (e.g., '5v') and appends it to a target item's title (e.g., 'Camera_5v'). After poisoning the training data, whenever a user query includes an item with this trigger, the system recommends the target item regardless of the user's actual preferences.

Key Novelty

BadRec (Backdoor Injection Poisoning for RecSys) and P-Scanner (Poison Scanner)

BadRec poisons the training set by injecting triggers into item titles and generating fake user interactions that treat these items as preferred targets.
The attack establishes a correlation between the textual trigger and the recommendation output, forcing the LLM to learn a backdoor mapping while maintaining normal performance on clean data.
P-Scanner (Defense) employs an LLM-based scanner to detect poisoned items, trained via a helper agent that synthesizes diverse triggers to simulate unknown attacks.

Architecture

Conceptual illustration of the Backdoor Attack Scenario in a commercial RecSys context.

Evaluation Highlights

Poisoning just 1% of the training data is sufficient to achieve an Attack Success Rate (ASR) of nearly 100% on the LLaRA model.
For the TALLRec model (zero-shot context), poisoning a single example in the prompt is sufficient to successfully implant the backdoor.
The attack preserves normal recommendation accuracy on benign inputs, making the backdoor difficult to detect through standard performance metrics.

Breakthrough Assessment

7/10

While backdoor attacks are known in NLP, this is a significant application to the specific domain of LLM-RecSys, demonstrating extreme vulnerability (1% poisoning) and proposing a targeted defense.

⚙️ Technical Details

Problem Definition

Setting: Next-item prediction using LLMs where inputs include user history and candidate items.

Inputs: Prompt P containing user profile u, interaction history I_u, and item pool I_c.

Outputs: Recommended item Y from the pool.

Pipeline Flow

Trigger Injection (Title Perturbation)
Fake User Generation
Dataset Poisoning
Model Training (Backdoor Implantation)

System Modules

Trigger Injector (Data Poisoning)

Inserts malicious text triggers (char, word, or sentence level) into the titles of target items.

Model or implementation: Rule-based string manipulation

Fake User Generator (Data Poisoning)

Creates synthetic interaction histories for fake users that culminate in selecting the poisoned target item.

Model or implementation: Heuristic/Random Sampling

LLM-RecSys (Victim)

Predicts the next item based on user history; learns the backdoor mapping during training.

Model or implementation: LLaRA or TALLRec (LLM backbones)

Novel Architectural Elements

BadRec Framework: A specific data construction pipeline for injecting backdoors into RecSys by manipulating textual metadata (titles) rather than user embeddings.

Modeling

Base Model: LLaRA and TALLRec (built upon open-source LLMs like LLaMA)

Training Method: Supervised Fine-Tuning (SFT) on poisoned dataset

Objective Functions:

Purpose: Align LLM with recommendation task and implant backdoor.

Formally: Minimize auto-regressive generation loss L = -log p(Y_i | X, Y_<i) over both benign and poisoned samples.

Training Data:

Benign set T (size n)
Poisoned set T~ (size m, fake users)
Poisoning Rate PR = m / (m + n)

Key Hyperparameters:

poisoning_rate_LLaRA: 0.01
poisoning_samples_TALLRec: 1 (with 16 total examples)
batch_size: Not reported in the paper
+ 1 more
learning_rate: Not reported in the paper

Compute: Not reported in the paper

Comparison to Prior Work

vs. Standard RecSys Attacks: BadRec targets the textual understanding capability of LLMs rather than just interaction graphs.
vs. NLP Backdoor Attacks: BadRec adapts backdoor concepts specifically to the recommendation context (User-Item history structure).

Limitations

The attack assumes the adversary can inject data into the training set (e.g., via a third-party platform or open-source data).
Analysis is limited to textual triggers in titles; other modalities or metadata fields are not explored in the provided text.

Reproducibility

No code repository provided. The paper describes the victim models (LLaRA, TALLRec) which are existing methods. The attack method is described conceptually (injecting triggers into titles). Datasets are described as 'three real-world datasets' but not named in the provided text.

📊 Experiments & Results

Evaluation Setup

Next-item prediction task where the model must select the correct item from a pool.

Benchmarks:

Real-world datasets (Sequential Recommendation)

Metrics:

Attack Success Rate (ASR)
Top-k Hit Ratio (H@k)
Valid / A-Valid (response validity)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Vulnerability analysis demonstrates that LLM-based RecSys are extremely susceptible to backdoor attacks with minimal data poisoning.
LLaRA Evaluation	Attack Success Rate (ASR)	0	100	+100
TALLRec Evaluation	Attack Success Rate (ASR)	0	100	+100

Main Takeaways

Poisoning just 1% of the training data allows an attacker to completely manipulate the recommendation output (ASR ~100%).
The attack does not degrade the recommendation performance on benign items, making it stealthy.
LLMs' powerful ability to memorize patterns makes them highly susceptible to learning simple textual triggers (like '5v') as strong indicators for recommendation.

📚 Prerequisite Knowledge

Prerequisites

Recommender Systems (RecSys) basics
Large Language Models (LLMs) fine-tuning
Backdoor Attacks / Data Poisoning

Key Terms

Backdoor Attack: A malicious method where a hidden trigger is injected into a model; the model behaves normally unless the trigger is present, which causes a targeted error.

Trigger: A specific pattern (e.g., a text string like '5v') inserted into input data to activate a backdoor.

Poisoning Rate (PR): The ratio of poisoned samples (fake user interactions with triggered items) to the total training dataset size.

ASR: Attack Success Rate—the percentage of times the model recommends the target poisoned item when the trigger is present.

LLaRA: A specific LLM-based recommender system architecture that combines ID-based embeddings with textual metadata.

TALLRec: A tuning framework for aligning LLMs with recommendation tasks, often used in few-shot or zero-shot settings.