IDIOMoE: Item-ID + Oral-language Mixture-of-Experts Language Model—the proposed architecture splitting FFNs into text and item experts.
Collaborative Filtering (CF): A recommendation technique that predicts user preferences based on past interactions (e.g., 'people who bought X also bought Y'), relying on ID patterns rather than item content.
Feed-Forward Network (FFN): A component within a Transformer block that processes information position-wise; in this paper, interpreted as a key-value memory.
Token-type gating: A routing mechanism that directs tokens to specific experts based on their type (e.g., Item ID vs. Text) rather than learned weights.
Knowledge interference: The phenomenon where learning new task-specific patterns (like ID sequences) degrades a model's performance on its original pre-training task (language modeling).
NDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that accounts for the position of relevant items in the recommendation list.
HR: Hit Rate—the fraction of test cases where the target item appears in the top-K recommendations.
MRR: Mean Reciprocal Rank—the average of the reciprocal ranks of the first relevant item.
SASRec: Self-Attentive Sequential Recommendation—a baseline model using self-attention to capture sequential patterns in user actions.