← Back to Paper List

Item-side Fairness of Large Language Model-based Recommendation System

Meng Jiang, Keqin Bao, Jizhi Zhang, Wenjie Wang, Zhengyi Yang, Fuli Feng, Xiangnan He
University of Science and Technology of China, National University of Singapore
arXiv (2024)
Recommendation P13N Benchmark

📝 Paper Summary

LLM-based Recommendation Systems (LRS) Trustworthy Recommendation
The paper reveals that LLM-based recommendation systems exhibit severe item-side unfairness due to popularity and semantic biases, and proposes the IFairLRS framework to mitigate this via training reweighting and inference reranking.
Core Problem
LLM-based Recommendation Systems (LRS) suffer from significant item-side unfairness, over-recommending popular items and specific genres due to biases in interaction history and the LLM's pre-trained semantic priors.
Why it matters:
  • Fair exposure is critical for the economic rights of item producers (e.g., job candidates, micro-businesses) and the visibility of content related to vulnerable populations
  • Existing fairness methods for conventional discriminative models do not directly apply to generative LRS, which rely on instruction tuning and text generation
  • Prior work like LLMRank only qualitatively observed popularity bias; a comprehensive quantitative investigation and mitigation framework for LRS was lacking
Concrete Example: In a movie recommendation scenario, an LRS like BIGRec might recommend 'The Mighty Ducks' (a popular comedy) even if the specific genre 'Comedy' was removed from the fine-tuning data, demonstrating that the model relies on unfair semantic priors from pre-training rather than just user history.
Key Novelty
IFairLRS Framework (In-learning Reweighting + Post-learning Reranking)
  • Conducts the first comprehensive quantitative audit of item-side fairness in LRS, distinguishing between biases arising from historical interactions (popularity) and biases from LLM semantic priors (genres)
  • Proposes a two-stage mitigation framework: 'In-learning' reweights training samples to balance target item distribution, and 'Post-learning' reranks outputs to punish unfairness [Implementation details not in provided text]
Evaluation Highlights
  • Comparison with SASRec (Self-Attentive Sequential Recommendation) reveals LRS is significantly more influenced by popularity bias, consistently recommending more popular items
  • Probing experiments show LRS recommends item genres never seen during fine-tuning, proving that unfairness stems partly from pre-trained semantic knowledge, not just interaction data
  • Analysis of 'grounding' (mapping generated text to items) shows it mitigates some inherent unfairness but transfers bias from low-popularity to high-popularity groups
Breakthrough Assessment
7/10
Significant for identifying that LRS fairness issues stem from pre-training priors, not just data imbalance. The proposed solution framework is standard (reweighting/reranking) but applied to a novel domain (LRS).
×