The Hong Kong Polytechnic University,
Jinan University
arXiv
(2025)
RecommendationAgent
📝 Paper Summary
Adversarial Attacks on Recommender SystemsLLM-based Agents
CheatAgent employs a Large Language Model as an autonomous attack agent to generate adversarial textual perturbations that mislead black-box LLM-empowered recommender systems into making incorrect recommendations.
Core Problem
LLM-empowered recommender systems are vulnerable to adversarial attacks, but traditional Reinforcement Learning (RL) attackers fail because they lack the language understanding and reasoning capabilities to manipulate complex textual inputs effectively.
Why it matters:
LLM-empowered RecSys are increasingly deployed in high-stakes environments (finance, healthcare), making safety vulnerabilities critical.
Existing black-box attackers (RL-based) cannot process textual item titles/descriptions or reason about context, leaving a gap in evaluating LLM-RecSys robustness.
Security concerns require testing systems under practical black-box settings where model weights are inaccessible.
Concrete Example:In a recommender system prompt asking for the top item for 'User_637' who liked 'item_1009', an attacker wishes to insert specific words or fake items into the history to force the system to recommend an irrelevant 'item_1072'. Traditional RL agents struggle to craft natural language perturbations to achieve this in a text-based prompt.
Key Novelty
CheatAgent Framework (LLM-as-Attacker)
Replaces traditional RL attack agents with an LLM-based agent that possesses human-like reasoning and open-world knowledge to craft effective textual perturbations.
Introduces an 'Insertion Positioning' strategy to identify optimal locations in the prompt for perturbation with minimal modification.
Utilizes a self-reflection policy optimization (via prompt tuning) to iteratively improve the attack strategy based on feedback from the victim system.
Architecture
Conceptual illustration of the CheatAgent framework attacking an LLM-empowered RecSys.
Breakthrough Assessment
7/10
This is the first work to investigate the safety vulnerability of LLM-empowered RecSys specifically using an LLM-based agent. It addresses the limitations of RL-based attacks in text-heavy contexts.
⚙️ Technical Details
Problem Definition
Setting: Black-box untargeted adversarial attack on LLM-empowered Recommender Systems
Inputs: Input prompt X containing prompt template P, user u, and interaction history V
Outputs: Adversarial perturbation delta (words or items) inserted into X to maximize loss L_Rec
Trainable Parameters: Prompts / Soft Prompts (implied by 'prompt tuning')
Compute: Not reported in the paper
Comparison to Prior Work
vs. KGAttack/PoisonRec: CheatAgent uses an LLM agent instead of RL, enabling it to handle complex textual inputs and context which RL agents fail to process effectively.
vs. Traditional Methods: CheatAgent incorporates open-world knowledge and reasoning via the LLM, whereas traditional methods optimize from scratch without such priors.
Limitations
Computational cost of using an LLM as an attack agent is likely higher than simple RL agents.
The approach relies on the 'black-box' assumption where outputs are observable; if outputs are hidden, the feedback loop breaks.
Success depends on the LLM agent's ability to align its domain knowledge with the specific RecSys domain.
Reproducibility
No replication artifacts mentioned in the paper snippet. The text refers to 'three real-world datasets' but does not name them in the provided text. Specific hyperparameters and model weights are not provided in the text.
📊 Experiments & Results
Evaluation Setup
Black-box attack on LLM-empowered RecSys
Benchmarks:
Three real-world datasets (Sequential Recommendation / Top-N Recommendation)
Metrics:
Recommendation Loss (Negative Log-Likelihood)
Recommendation Performance (Implicitly HR/NDCG, though not explicitly named in snippet)
Statistical methodology: Not explicitly reported in the paper
Main Takeaways
The paper claims extensive experiments demonstrate the safety vulnerability of LLM-empowered RecSys against the proposed attacks.
The method focuses on 'untargeted attacks' aiming to promote irrelevant items.
The framework allows for both prompt perturbations (adding words) and profile perturbations (adding fake item interactions).
📚 Prerequisite Knowledge
Prerequisites
Basics of Recommender Systems (RecSys)
Adversarial Attacks (Black-box setting)
Large Language Models (LLMs) and Prompt Tuning
Key Terms
LLM-empowered RecSys: Recommender systems that use Large Language Models to encode textual information (reviews, titles) and reason about user preferences
Black-box attack: An attack setting where the adversary has no access to the victim model's gradients or parameters, only inputs and outputs
Perturbation: Malicious modifications (e.g., inserted words or fake item interactions) added to the input data to deceive the model
RL-based agents: Reinforcement Learning agents used in prior work to attack recommender systems by learning policies to inject malicious items
Hamming distance: A metric used here to constrain the magnitude of the attack, measuring the difference between the original and adversarial input
Self-reflection: A mechanism allowing the LLM agent to analyze feedback from the victim system to refine its future attack strategies