← Back to Paper List

Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation

R Shirkavand, X Wei, C Wang, Z Hui, H Huang…
University of Maryland - College Park, University of Cambridge
arXiv, 9/2025 (2025)
Recommendation P13N Pretraining

📝 Paper Summary

LLM-based recommendation Generative recommendation
IDIOMoE splits LLM Feed-Forward Networks into separate text and item-ID experts using token-type gating, enabling effective collaborative filtering without degrading the model's natural language understanding.
Core Problem
Integrating collaborative filtering signals (item IDs) into LLMs often causes 'knowledge interference,' where the model's semantic language capabilities degrade and recommendation accuracy suffers due to entangled representations.
Why it matters:
  • Modern systems need to combine the accuracy of collaborative filtering with the reasoning and conversational abilities of LLMs.
  • Naive approaches that simply mix ID tokens and text tokens into a shared model lead to suboptimal performance on both tasks.
  • Scaling parameters alone does not solve the fundamental interference between opaque ID patterns and rich semantic text.
Concrete Example: When a standard LLM is trained on mixed sequences of text and item IDs (e.g., 'User bought <item_53>'), the shared parameters struggle to model the ID co-occurrence patterns without forgetting general language knowledge. The paper shows a baseline 'Item-LLM' improving recommendation but suffering on language benchmarks (e.g., higher perplexity on WikiText).
Key Novelty
Token-Type Mixture-of-Experts (IDIOMoE)
  • Treats item interaction histories as a distinct 'dialect' separate from natural language.
  • Replaces each Transformer Feed-Forward Network (FFN) with two experts: a frozen 'Text Expert' and a trainable 'Item Expert'.
  • Uses a static gate based on token type to route item ID tokens to the Item Expert and all other tokens to the Text Expert, preventing destructive interference.
Architecture
Architecture Figure Figure 2
The IDIOMoE architecture, detailing the replacement of the standard FFN with a Mixture-of-Experts module containing a Text Expert and an Item Expert, controlled by a Token-Type Gate.
Evaluation Highlights
  • +27.1% NDCG@10 improvement over SASRec on a large-scale proprietary industrial dataset (hundreds of millions of users).
  • Achieves the best performance among LLM-based methods on Amazon Books and Toys datasets, surpassing baselines like Item-LLM and Text-Attr LLM.
  • Preserves pre-trained language capabilities, achieving substantially lower negative log-likelihood on WikiText compared to text-derived bias baselines.
Breakthrough Assessment
8/10
Proposes a clean, architectural solution to the well-known 'semantic-collaborative gap' in recommender systems. The method is intuitive, effective on large-scale industrial data, and offers a strong balance between recommendation accuracy and language preservation.
×