← Back to Paper List

VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

Minh-Anh Nguyen, Bao Nguyen, Ha Lan N. T., Tuan Anh Hoang, Duc-Trong Le, Dung D. Le
College of Engineering and Computer Science, VinUniversity, University of Fredericksburg, Ho Chi Minh City University of Technology
arXiv (2025)
Recommendation P13N KG

📝 Paper Summary

Graph-based Recommendation Systems LLM-based Data Augmentation
VoteGCL augments graph recommendation data by using Large Language Models to repeatedly rerank candidate items and aggregating results via majority voting to ensure high-confidence synthetic interactions.
Core Problem
Graph-based recommendation systems suffer from data sparsity and popularity bias, while existing LLM-based augmentation methods produce inconsistent results due to stochastic generation and misaligned embeddings.
Why it matters:
  • Data sparsity limits the effectiveness of collaborative filtering, leading to poor recommendations for users with few interactions (cold start)
  • Directly using LLM-generated embeddings often causes distributional shifts that degrade performance when integrated with collaborative signals
  • Existing LLM augmentation is unstable; single inference runs yield fluctuating results (e.g., varying NDCG scores on Netflix) due to the probabilistic nature of LLMs
Concrete Example: When prompting an LLM to predict user preferences, one run might rank 'Inception' high while another ranks it low due to randomness. Existing methods using a single pass introduce noise. VoteGCL runs the ranking N times; if 'Inception' appears at the top in most runs, it is reliably added as a synthetic edge, reducing noise.
Key Novelty
VoteGCL: Majority-Voting LLM-Rerank Augmentation
  • Reformulates augmentation as a reranking task where an LLM orders candidate items multiple times, aggregating results via a simplified Reciprocal Rank Fusion (RRF) to filter out stochastic noise
  • Integrates these high-confidence synthetic interactions into a Graph Contrastive Learning framework, aligning the original and augmented graph views to mitigate popularity bias without needing complex embedding alignment
Architecture
Architecture Figure Figure 3
The overall VoteGCL framework, illustrating the two-stage process: Data Augmentation via Majority-Vote Reranking and Graph Contrastive Learning.
Evaluation Highlights
  • Outperforms state-of-the-art baselines (e.g., LightGCN, SimGCL) on Netflix dataset with +5.79% improvement in NDCG@20
  • Reduces popularity bias significantly, lowering popularity consumption by ~40% on the Amazon Book dataset compared to LightGCN
  • Demonstrates robustness to noise, maintaining performance gains even as the number of voting rounds (N) increases, validating the theoretical concentration of measure guarantees
Breakthrough Assessment
7/10
A solid methodological improvement that addresses the specific instability of LLM generation in RS. The theoretical grounding via concentration of measure adds rigor, though the core components (LLM reranking + Contrastive Learning) are established techniques combined in a novel way.
×