← Back to Paper List

Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation

Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng
University of Science and Technology of China, Huawei Inc., National University of Singapore
arXiv (2024)
Recommendation P13N

📝 Paper Summary

LLM-based Recommendation (RecLLM) Decoding Strategies
The paper proposes D3, a decoding strategy for recommender LLMs that removes length normalization to fix score inflation from deterministic 'ghost tokens' and integrates a text-free model to reduce homogeneity.
Core Problem
Standard LLM decoding strategies (like beam search) applied to recommendation suffer from score inflation due to length normalization on deterministic tokens and produce repetitive, homogeneous outputs.
Why it matters:
  • Original decoding methods amplify scores for items with 'ghost tokens' (tokens with probability ≈ 1), distorting rankings
  • LLMs tend to generate items textually similar to each other or the user's history (e.g., 'PlayStation 3' and 'PlayStation 4'), reducing diversity
  • Current approaches prioritize training enhancements while overlooking the critical impact of the decoding phase on recommendation quality
Concrete Example: When suggesting products, an LLM might recommend 'PlayStation 3' and 'PlayStation 4' solely because they share similar text structures, or it might repetitively copy features from the user's history due to the match-and-copy mechanism.
Key Novelty
Debiasing-Diversifying Decoding (D3)
  • Identifies 'ghost tokens' (deterministic tokens) that cause score inflation when length-normalized, and mitigates this by removing length normalization entirely (since removing ghosts makes lengths uniform)
  • Addresses homogeneity by incorporating scores from a 'text-free' assistant model (like collaborative filtering) during decoding to guide the LLM toward diverse, non-repetitive items
Breakthrough Assessment
7/10
Identifies a specific, overlooked structural problem in applying NLP decoding to recommendation (ghost tokens) and proposes a logical, lightweight fix. Score limited by lack of provided experimental results in the source text.
×