Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation

📝 Paper Summary

LLM-based recommendation Robustness against shilling attacks Retrieval-Augmented Generation (RAG)

RETURN protects LLM-based recommender systems from malicious interaction noise during inference by retrieving collaborative item subgraphs to identify and purify inconsistent user behaviors.

Core Problem

LLM-empowered recommender systems are highly vulnerable during inference to minor perturbations (e.g., clickbait items) in user history, which mislead the model into generating poor recommendations.

Why it matters:

Existing defenses focus on training-time purification, failing to protect well-trained models from noisy inference-time interactions caused by clickbait or attacks
A single irrelevant item in a user's sequence can drastically alter LLM outputs due to its sensitivity to input prompts
Current LLM-RecSys rely on internal knowledge, often missing the collaborative signals (item co-occurrence) needed to verify if an interaction is genuine or noise

Concrete Example: If a user shopping for 'suits' and 'dresses' accidentally clicks on 'ties' (a perturbation item inserted by attackers), the LLM might fixate on the 'ties' and fail to discern the user's true intent, recommending irrelevant accessories instead of the main clothing items.

Key Novelty

Retrieval-Augmented Purification (RETURN)

Constructs collaborative item graphs from external datasets to capture item-item co-occurrence patterns (e.g., items frequently bought together)
Retrieves subgraphs for a user's interaction sequence to score the consistency of each item; items with low collaborative support are flagged as noise
Purifies the user profile by deleting or replacing flagged items and uses an ensemble strategy to combine predictions from multiple purified profiles

Architecture

The overall framework of RETURN, illustrating the flow from a perturbed user sequence to robust recommendation via collaborative graph retrieval.

Evaluation Highlights

Significantly outperforms baselines like TALLRec and various defense strategies on three real-world datasets (Amazon Beauty, Sports, Toys)
Demonstrates robustness against both random noise and adversarial perturbations, maintaining high Hit Ratio and NDCG where other models degrade
Plug-and-play capability allows integration with existing LLM-RecSys without retraining the LLM backbone

Breakthrough Assessment

7/10

Novel application of RAG for inference-time purification in RecSys. Addresses a critical vulnerability (inference noise) that training-time defenses miss, with a practical, training-free approach.

⚙️ Technical Details

Problem Definition

Setting: Robust sequential recommendation using LLMs under inference-time perturbations

Inputs: User u, perturbed interaction history I_u (containing target items and noise/perturbations)

Outputs: Recommended item y (or ranked list)

Pipeline Flow

Graph Construction: Build item-item co-occurrence graph from external data
Retrieval: For a user sequence, retrieve relevant subgraphs
Positioning: Identify low-consistency items (potential perturbations) using collaborative signals
Purification: Generate multiple cleansed profile views via Deletion or Replacement
Prediction: LLM generates recommendations for each view
Ensemble: Aggregate predictions for final robust output

System Modules

Collaborative Graph Constructor

Encodes external user sequences into an item-item graph where edge weights represent co-occurrence frequency

Model or implementation: Graph construction algorithm

Perturbation Positioner (Purification)

Scores each item in the user sequence based on its connectivity to other items in the retrieved subgraph

Model or implementation: Retrieval-augmented scoring function

Profile Cleanser (Purification)

Creates purified versions of the user profile by removing or replacing low-score items

Model or implementation: Heuristic strategies (Deletion/Replacement)

Robust Ensemble Recommender

Generates recommendations for each purified profile and aggregates them

Model or implementation: LLM Backbone (e.g., TALLRec)

Novel Architectural Elements

Inference-time RAG loop specifically for denoising input prompts rather than augmenting context for generation
Collaborative-graph-based scoring mechanism to identify outliers in sequential textual data

Modeling

Base Model: Examples include TALLRec (based on LLaMA-7B) as the backbone LLM

Training Method: Training-free inference-time defense

Compute: Not reported in the paper

Comparison to Prior Work

vs. GraphRfi/LoRec: RETURN targets inference-time perturbations rather than training-set purification
vs. LLM4DASR: RETURN uses external collaborative graphs for verification instead of internal uncertainty estimation
vs. TALLRec: RETURN adds a purification layer before the TALLRec model to handle noise
+ 1 more
vs. DL-based Denoising [not cited in paper]: Unlike AutoEncoder-based denoising, RETURN leverages explicit semantic co-occurrence from graphs

Limitations

Reliance on the quality and coverage of the external collaborative graph
Inference latency increases due to retrieval and ensemble (multiple LLM calls)
Performance depends on the underlying LLM's ability to handle the purified prompts

📊 Experiments & Results

Evaluation Setup

Sequential recommendation with adversarial/noisy perturbations inserted into user history

Benchmarks:

Amazon Beauty (Sequential Recommendation)
Amazon Sports (Sequential Recommendation)
Amazon Toys (Sequential Recommendation)

Metrics:

Hit Ratio (HR@10)
NDCG@10
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

RETURN consistently improves robustness against perturbations compared to standard LLM-RecSys (TALLRec) and other defenses.
The use of collaborative graphs effectively identifies items that are statistically unlikely to co-occur with the user's core interests.
Ensemble strategies (combining multiple purified views) provide better stability than single-view purification.
The method is effective across different domains (Beauty, Sports, Toys), suggesting generalizability of the collaborative signal approach.

📚 Prerequisite Knowledge

Prerequisites

Sequential Recommendation
Large Language Models (LLMs)
Collaborative Filtering
Adversarial Attacks/Perturbations in RecSys

Key Terms

LLM-empowered RecSys: Recommender systems that use Large Language Models to process user history (often as text) and generate recommendations

RAG: Retrieval-Augmented Generation—enhancing model inputs by fetching relevant external data (here, collaborative signals) before generation

Collaborative Item Graph: A graph where nodes are items and edges represent co-occurrence (e.g., bought together) in user histories

Perturbation: Noise or malicious items inserted into a user's interaction sequence (e.g., via shilling attacks or clickbait) that skew recommendations

Purification: The process of identifying and removing/replacing noisy items from the user's interaction history before feeding it to the recommender

NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that gives more weight to correct items appearing earlier in the list

Hit Ratio: The percentage of times the correct target item appears in the top-K recommendations