The Hong Kong Polytechnic University, Hong Kong SAR, China
arXiv, 4/2025
(2025)
RecommendationRAGP13N
📝 Paper Summary
LLM-based recommendationRobustness against shilling attacksRetrieval-Augmented Generation (RAG)
RETURN protects LLM-based recommender systems from malicious interaction noise during inference by retrieving collaborative item subgraphs to identify and purify inconsistent user behaviors.
Core Problem
LLM-empowered recommender systems are highly vulnerable during inference to minor perturbations (e.g., clickbait items) in user history, which mislead the model into generating poor recommendations.
Why it matters:
Existing defenses focus on training-time purification, failing to protect well-trained models from noisy inference-time interactions caused by clickbait or attacks
A single irrelevant item in a user's sequence can drastically alter LLM outputs due to its sensitivity to input prompts
Current LLM-RecSys rely on internal knowledge, often missing the collaborative signals (item co-occurrence) needed to verify if an interaction is genuine or noise
Concrete Example:If a user shopping for 'suits' and 'dresses' accidentally clicks on 'ties' (a perturbation item inserted by attackers), the LLM might fixate on the 'ties' and fail to discern the user's true intent, recommending irrelevant accessories instead of the main clothing items.
Key Novelty
Retrieval-Augmented Purification (RETURN)
Constructs collaborative item graphs from external datasets to capture item-item co-occurrence patterns (e.g., items frequently bought together)
Retrieves subgraphs for a user's interaction sequence to score the consistency of each item; items with low collaborative support are flagged as noise
Purifies the user profile by deleting or replacing flagged items and uses an ensemble strategy to combine predictions from multiple purified profiles
Architecture
The overall framework of RETURN, illustrating the flow from a perturbed user sequence to robust recommendation via collaborative graph retrieval.
Evaluation Highlights
Significantly outperforms baselines like TALLRec and various defense strategies on three real-world datasets (Amazon Beauty, Sports, Toys)
Demonstrates robustness against both random noise and adversarial perturbations, maintaining high Hit Ratio and NDCG where other models degrade
Plug-and-play capability allows integration with existing LLM-RecSys without retraining the LLM backbone
Breakthrough Assessment
7/10
Novel application of RAG for inference-time purification in RecSys. Addresses a critical vulnerability (inference noise) that training-time defenses miss, with a practical, training-free approach.
⚙️ Technical Details
Problem Definition
Setting: Robust sequential recommendation using LLMs under inference-time perturbations
Inputs: User u, perturbed interaction history I_u (containing target items and noise/perturbations)
Outputs: Recommended item y (or ranked list)
Pipeline Flow
Graph Construction: Build item-item co-occurrence graph from external data
Retrieval: For a user sequence, retrieve relevant subgraphs
Positioning: Identify low-consistency items (potential perturbations) using collaborative signals
Purification: Generate multiple cleansed profile views via Deletion or Replacement
Prediction: LLM generates recommendations for each view
Ensemble: Aggregate predictions for final robust output
System Modules
Collaborative Graph Constructor
Encodes external user sequences into an item-item graph where edge weights represent co-occurrence frequency
Model or implementation: Graph construction algorithm
Perturbation Positioner (Purification)
Scores each item in the user sequence based on its connectivity to other items in the retrieved subgraph
Model or implementation: Retrieval-augmented scoring function
Profile Cleanser (Purification)
Creates purified versions of the user profile by removing or replacing low-score items
Model or implementation: Heuristic strategies (Deletion/Replacement)
Robust Ensemble Recommender
Generates recommendations for each purified profile and aggregates them
Model or implementation: LLM Backbone (e.g., TALLRec)
Novel Architectural Elements
Inference-time RAG loop specifically for denoising input prompts rather than augmenting context for generation
Collaborative-graph-based scoring mechanism to identify outliers in sequential textual data
Modeling
Base Model: Examples include TALLRec (based on LLaMA-7B) as the backbone LLM
Training Method: Training-free inference-time defense
Compute: Not reported in the paper
Comparison to Prior Work
vs. GraphRfi/LoRec: RETURN targets inference-time perturbations rather than training-set purification
vs. LLM4DASR: RETURN uses external collaborative graphs for verification instead of internal uncertainty estimation
vs. TALLRec: RETURN adds a purification layer before the TALLRec model to handle noise