Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models

📝 Paper Summary

Denoising Recommender Systems LLM-enhanced Recommendation Negative Sampling

LLMHNI leverages Large Language Models to distinguish valuable hard negative samples from noisy interactions in recommender systems by analyzing semantic embeddings and performing logical reasoning on user-item relevance.

Core Problem

Standard denoising methods in recommender systems rely on loss values to identify noise, but 'hard' samples (valuable for learning) and 'noisy' samples (misclicks) exhibit similar loss patterns, leading to the removal of informative data.

Why it matters:

Hard samples are theoretically and empirically vital for modeling accurate user preferences, but current denoisers mistakenly drop them.
Implicit feedback (clicks) is inherently noisy due to position bias and misclicks, but separating this noise from difficult user interests solely via numerical patterns (collaborative signals) is insufficient.

Concrete Example: A user might have high predicted probability for an item they haven't clicked. This could be a 'noisy' false positive (misclick) or a 'hard' sample (user likes it but hasn't seen it). A loss-based denoiser sees high loss for both and drops both. LLMHNI uses LLM reasoning to see if the item actually matches the user's interest profile (Hard) or not (Noisy).

Key Novelty

LLM-enhanced Hard-Noisy sample Identification (LLMHNI)

Utilizes **Semantic Relevance**: Projects LLM-encoded text embeddings into the recommendation space to identify 'hard negatives'—items the model predicts highly but which have low semantic similarity to the user.
Utilizes **Logical Relevance**: Prompts an LLM to explicitly reason about the logical connection between a user and item (User-Centric and Item-Centric), constructing a refined interaction graph that includes 'hard' samples while filtering 'noise'.
Uses **Cross-Graph Contrastive Alignment** to enforce consistency between the original interaction graph and the LLM-refined graph, suppressing hallucinations via edge-dropping augmentation.

Architecture

The overall LLMHNI framework, illustrating the two parallel modules: Semantic Relevance Guided Hard Negative Mining (left) and Logical Relevance Guided Interaction Denoising (right).

Breakthrough Assessment

7/10

Addresses a subtle but critical failure mode in denoising (hard vs. noise confusion) using a well-structured dual-signal approach (semantic + logical). The addition of hallucination mitigation makes it robust.

⚙️ Technical Details

Problem Definition

Setting: Denoising Recommender System training using implicit feedback

Inputs: User set U, Item set I, Interaction dataset D (containing true positives, false positives, and missing values), Text profiles P_u and P_i

Outputs: Scoring function f_theta(u, i) ranking items for users

Pipeline Flow

LLM Encoder + MLP Projector (Aligns text embeddings)
Hard Negative Miner (Selects negatives using aligned embeddings)
Logical Relevance Inference (LLM rates candidate pairs)
Graph Constructor (Builds refined graph G' with hard samples)
Recommender Training (Optimizes BPR + Denoise Loss + Hallucination Loss)

System Modules

Embedding Projector (Semantic Relevance Module)

Aligns generic LLM text embeddings to the recommendation space

Model or implementation: MLP (Multi-Layer Perceptron)

Hard Negative Sampler (Semantic Relevance Module)

Selects hard negative items for training

Model or implementation: Sampling Algorithm

Logical Relevance Reasoner

Classifies interactions as Hard or Noisy via LLM reasoning

Model or implementation: Large Language Model (inference only)

Graph Contrastive Learner

Mitigates impact of LLM hallucinations in the refined graph

Model or implementation: GNN-based Recommender

Novel Architectural Elements

Dual-auxiliary signal integration: combining vector-space semantic relevance (for negative sampling) with reasoning-space logical relevance (for graph refinement).
Cross-graph contrastive alignment strategy to synchronize the original interaction graph with an LLM-inferred graph while handling hallucinations.

Modeling

Base Model: Generic GNN-based Recommender (backbone) + Unspecified LLM for encoding/reasoning

Training Method: Multi-task learning with BPR loss and Contrastive losses

Objective Functions:

Purpose: Optimize recommendation accuracy.

Formally: BPR Loss (Bayesian Personalized Ranking).
Purpose: Train the projector to align LLM embeddings with RecSys space.

Formally: Contrastive loss maximizing similarity of pseudo-positive pairs.
Purpose: Align representations between original graph and LLM-refined graph.

Formally: Cross-graph contrastive alignment loss (L_de).
Purpose: Suppress hallucinations in LLM-inferred edges.

Formally: Graph contrastive learning loss with random edge dropping (L_hal).

Key Hyperparameters:

tau: 0.5 (temperature for projector training)
N: 50 (sample quality control for pseudo labels)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Standard Denoising: LLMHNI distinguishes between hard and noisy samples using external LLM signals rather than just internal loss patterns.
vs. Existing LLM-Rec methods: LLMHNI specifically targets the hard/noise confusion problem and includes an objective alignment step for embeddings.

Limitations

Reliance on LLM inference for logical relevance could be computationally expensive for very large interaction graphs (mitigated by pre-selection).
Performance depends on the quality of the LLM's reasoning and the richness of text profiles.
Potential for LLM hallucination to introduce new noise (mitigated by contrastive learning, but risk remains).

Reproducibility

Code: https://github.com/TianRui-Song717/LLMHNI

Code is publicly available. The specific LLM used for encoding and inference is not explicitly named in the provided text snippet. Training datasets are mentioned as 'three real-world datasets' but not named in the snippet.

📊 Experiments & Results

Evaluation Setup

Not reported in the provided paper text

Metrics:

Statistical methodology: Not reported in the paper

Main Takeaways

The paper identifies a critical flaw in existing denoising methods: the confusion between 'hard' samples and 'noisy' samples due to similar loss patterns.
The proposed LLMHNI framework effectively separates these samples using two auxiliary signals: semantic relevance (via aligned embeddings) and logical relevance (via LLM reasoning).
A specific alignment module is necessary to adapt general-purpose LLM embeddings to the specific objective of user preference modeling.
Graph contrastive learning with edge dropping is employed to robustly handle potentially unreliable (hallucinated) interactions suggested by the LLM.

📚 Prerequisite Knowledge

Prerequisites

Implicit Feedback in Recommender Systems
Bayesian Personalized Ranking (BPR) Loss
Contrastive Learning
Graph Neural Networks (GNN)

Key Terms

Hard Samples: Data points that are difficult for the model to classify correctly (high loss) but contain valuable information for refining the decision boundary.

Noisy Samples: Data points with incorrect labels (e.g., accidental clicks) that confuse the model and should be removed or down-weighted.

False Positive Noise: Items interacted with by the user that do not reflect actual preference (e.g., misclicks).

False Negative Noise: Items not interacted with by the user despite being preferred (e.g., due to position bias).

Semantic Relevance: Similarity between user and item derived from their textual descriptions/embeddings.

Logical Relevance: Reasoning-based assessment of whether an item logically fits a user's past behavior and preferences.

Objective Alignment: Projecting embeddings from one task space (e.g., language modeling) to another (e.g., recommendation) to ensure compatibility.