Unleashing the Power of Large Language Model for Denoising Recommendation

📝 Paper Summary

Recommender Systems Denoising Implicit Feedback

LLaRD utilizes Large Language Models to generate semantic and relational knowledge from interaction graphs, then applies the Information Bottleneck principle to filter noise and hallucinations for robust recommendation.

Core Problem

Implicit feedback in recommender systems is inherently noisy (e.g., accidental clicks), and existing denoising methods rely on limited observational data or rigid assumptions that fail to capture true user intent.

Why it matters:

False positive interactions (accidental clicks) and false negatives (unexposed items) severely degrade recommendation accuracy.
Current methods struggle to identify 'noise' that actually represents latent interests (e.g., an art lover clicking a gardening video might indicate a new hobby, not noise).
LLMs have world knowledge but struggle to directly process complex collaborative graph structures or align their broad knowledge with specific recommendation targets.

Concrete Example: If an art enthusiast accidentally clicks a gardening video, traditional methods label it noise because it mismatches their profile. LLaRD uses an LLM to reason that 'gardening sketches' links the two, identifying it as a potential latent interest rather than pure noise.

Key Novelty

LLM-enhanced Recommendation Denoiser (LLaRD)

Generates 'Preference Knowledge' by using LLMs to infer user profiles and item characteristics from text, expanding the scope of observational data.
Generates 'Relation Knowledge' via a user-centric Chain-of-Thought (CoT) on the interaction graph, reasoning about multi-hop neighbors to find collaborative signals.
Applies the Information Bottleneck (IB) principle to align this generated knowledge with the recommendation task, explicitly filtering out LLM hallucinations and irrelevant noise.

Architecture

The overall framework of LLaRD, detailing the Knowledge Generation Module and the Knowledge-Enhanced Denoising Module.

Evaluation Highlights

Outperforms state-of-the-art denoising methods (e.g., RGCF, ROC) by significant margins on Amazon-Book, Yelp, and TikTok datasets.
Achieves up to +14.29% improvement in Recall@20 on the TikTok dataset compared to the best baseline.
Demonstrates robustness to noise, maintaining performance even when 20% additional noise is injected into the training data.

Breakthrough Assessment

8/10

Strong methodological contribution effectively bridging LLMs, graph reasoning, and information-theoretic denoising. The integration of CoT on graphs for noise detection is particularly novel.

⚙️ Technical Details

Problem Definition

Setting: Denoising Recommendation with Implicit Feedback

Inputs: User set U, Item set I, Interaction matrix R (implicit feedback with potential noise), Textual side information

Outputs: Cleaned interaction probabilities or noise-free user/item representations for predicting unobserved interactions

Pipeline Flow

Knowledge Generation (LLM-based Preference & Relation Mining)
Representation Learning (Encoding Graph & Knowledge)
Knowledge-Enhanced Denoising (IB-based Optimization)

System Modules

Preference Knowledge Generator (Knowledge Generation)

Extracts semantic user profiles and item features from text.

Model or implementation: LLM (e.g., GPT-3.5/4 or open variants)

Relation Knowledge Generator (Knowledge Generation)

Mines collaborative signals and identifies noise via graph reasoning.

Model or implementation: LLM with CoT Prompting

Denoising Module (IB)

Filters noise and aligns generated knowledge with the recommendation task.

Model or implementation: Information Bottleneck Objective

Novel Architectural Elements

User-centric Graph Chain-of-Thought (CoT): Adapts CoT reasoning specifically for traversing user-item interaction graphs to identify noise and latent interests.
Dual-Knowledge IB Alignment: Applies Information Bottleneck specifically to align diverse LLM-generated knowledge (Preference + Relation) with collaborative filtering signals.

Modeling

Base Model: LightGCN or SASRec (as backbones for recommendation)

Training Method: Joint training of recommendation loss and IB-regularized denoising loss

Objective Functions:

Purpose: Maximize prediction accuracy.

Formally: BPR Loss (Bayesian Personalized Ranking).
Purpose: Enforce Information Bottleneck principle for denoising.

Formally: IB Loss = -I(Z; Y) + beta * I(Z; X), implemented via variational bounds.

Key Hyperparameters:

learning_rate: 1e-3 (LightGCN), 1e-4 (SASRec)
batch_size: 2048
embedding_dimension: 64
+ 1 more
IB_beta: 1e-2 to 1e-1 (varies by dataset)

Compute: Not explicitly reported in the paper (implies standard GPU training for GNNs, LLM inference is preprocessing).

Comparison to Prior Work

vs. RGCF/HIDC: LLaRD uses external world knowledge from LLMs to judge noise, rather than relying solely on internal data consistency.
vs. LLM-Rec: LLaRD specifically focuses on the *denoising* capability of LLMs via CoT and IB, rather than just enriching representations.
vs. AutoDenoiser [not cited in paper]: LLaRD incorporates graph structural reasoning (CoT) rather than just instance-level filtering.

Limitations

Heavy reliance on LLM inference for preprocessing interaction graphs, which can be costly for very large datasets.
The quality of denoising depends on the LLM's inherent world knowledge; hallucinations could theoretically introduce new noise (though IB mitigates this).
Effectiveness might vary for domains with sparse textual descriptions where LLMs cannot infer strong preferences.

Reproducibility

Code: https://github.com/shuyao-wang/LLaRD

Code is publicly available at https://github.com/shuyao-wang/LLaRD. The paper details the prompt templates for preference and relation generation. Specific LLM version (e.g., exact GPT checkpoint) for generation is not strictly specified but implies standard instruction-tuned models.

📊 Experiments & Results

Evaluation Setup

Top-K Recommendation on implicit feedback datasets

Benchmarks:

Amazon-Book (E-commerce Recommendation)
Yelp (Business/Restaurant Recommendation)
TikTok (Short Video Recommendation)

Metrics:

Recall@20
NDCG@20
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Main performance comparison with LightGCN backbone across three datasets.
Amazon-Book	Recall@20	0.0468	0.0514	+0.0046
Yelp	Recall@20	0.0694	0.0766	+0.0072
TikTok	Recall@20	0.0602	0.0688	+0.0086
Main performance comparison with SASRec backbone across three datasets.
TikTok	Recall@20	0.0718	0.0782	+0.0064

Experiment Figures

Performance comparison (Recall@20) under varying noise ratios (0% to 20%) on Amazon-Book and Yelp.

Ablation study on the impact of different knowledge components (w/o Preference, w/o Relation, w/o IB).

Main Takeaways

LLaRD consistently improves performance over both standard backbones (LightGCN, SASRec) and dedicated denoising baselines (RGCF, ROC).
Ablation studies confirm that both Preference Knowledge and Relation Knowledge (via CoT) are essential; removing either degrades performance.
The method is robust: as noise ratio increases (0% to 20%), LLaRD maintains a larger performance gap over baselines, proving its effectiveness in high-noise regimes.

📚 Prerequisite Knowledge

Prerequisites

Recommender Systems (Implicit Feedback)
Large Language Models (Prompting, CoT)
Graph Neural Networks (GNNs)
Information Bottleneck (IB) Principle

Key Terms

LLaRD: Large Language Model-enhanced Recommendation Denoiser—the proposed framework.

CoT: Chain-of-Thought—a prompting strategy where the model generates intermediate reasoning steps.

Information Bottleneck (IB): An information-theoretic principle that learns representations by maximizing relevant information (to the target) while minimizing irrelevant information (compression).

False Positive: An observed interaction (e.g., click) that does not reflect a true user preference (noise).

False Negative: A true user preference that was not observed in the data (e.g., due to lack of exposure).

BPR: Bayesian Personalized Ranking—a standard pairwise loss function for optimizing recommender systems.

LightGCN: A simplified Graph Convolutional Network for recommendation that relies only on neighbor aggregation.

SASRec: Self-Attentive Sequential Recommendation—a sequence-based recommendation model used as a backbone.