Central South University,
City University of Macau,
Zhejiang University,
Alibaba Group
arXiv
(2025)
RecommendationP13N
📝 Paper Summary
Cold-start RecommendationLLM for Recommendation
FilterLLM replaces the slow pairwise "Text-to-Judgment" paradigm with a "Text-to-Distribution" approach, treating users as vocabulary tokens to predict interaction probabilities for billions of users in a single LLM inference.
Core Problem
Existing LLM-based cold-start methods use a 'Text-to-Judgment' paradigm that evaluates user-item pairs sequentially, leading to linear computational costs and necessitating small, pre-filtered candidate sets that limit performance.
Why it matters:
Billion-scale platforms publish thousands of new items per second, requiring initial embeddings to be generated instantly, which sequential processing cannot handle
Pre-filtering candidates (e.g., to hundreds of users) restricts the LLM's scope, preventing it from discovering suitable users outside the small sample
Sequential inference limits user context to a partial subset of history due to context window constraints
Concrete Example:To predict interactions for a new item across 1 million users, a 'Text-to-Judgment' model requires 1 million separate inference calls (inputting 'User X, Item Y: Interact?'). FilterLLM achieves this in a single call by outputting a probability distribution over 1 million user tokens.
Key Novelty
Text-to-Distribution Paradigm with User-Vocabulary
Extends the LLM's vocabulary by assigning a unique token to every user, enabling the model to 'speak' user IDs directly as output probabilities
Initializes these user tokens using collaborative filtering embeddings to align semantic space with behavioral space before fine-tuning
Transforms the recommendation task from binary classification of pairs to a next-token prediction task over the user set
Architecture
The overall framework of FilterLLM, illustrating the User Vocabulary Construction and the Distribution Prediction process.
Evaluation Highlights
Achieves over 30 times higher efficiency compared to state-of-the-art method ColdLLM
Processed over one billion cold items during a two-month deployment on the Alibaba platform
Online A/B testing validates effectiveness in a real-world billion-scale system
Breakthrough Assessment
8/10
The shift from linear 'Text-to-Judgment' to constant-time 'Text-to-Distribution' via user tokens is a significant architectural optimization for industrial deployment, addressing a critical bottleneck in LLM-based RecSys.
⚙️ Technical Details
Problem Definition
Setting: Cold-start Item Recommendation via Interaction Simulation
Inputs: Content features of a cold item c_i
Outputs: Probability distribution P(u|c_i) over the entire user set U
Pipeline Flow
Item Content Encoding (LLM) → User Distribution Prediction (Head) → Interaction Sampling → Cold Embedding Update
System Modules
Item Encoder
Encodes item text content into a hidden representation
Model or implementation: Large Language Model (backbone not specified in text)
Distribution Head
Maps the hidden state to a probability distribution over the user vocabulary
Model or implementation: Linear layer tied to User Vocabulary Embeddings
Sampler
Samples potential users from the predicted distribution to simulate interactions
Model or implementation: Sampling Strategy (e.g., Top-k or probabilistic)
Embedding Updater
Updates the cold item's embedding using the simulated interactions
Model or implementation: Optimization Function (Opt)
Novel Architectural Elements
User-Vocabulary extension: Directly embedding billion-scale user IDs into the LLM's token space
Collaborative-driven initialization module: Initializing these new user tokens via an item-oriented BPR loss pre-training step
Modeling
Base Model: Large Language Model (Specific architecture not detailed in text)
Training Method: Two-stage training: User Vocabulary Initialization then Distribution Fine-tuning
Objective Functions:
Purpose: Initialize user tokens to capture behavioral patterns before LLM training.
Formally: Item-oriented BPR loss L_BPR = sum(-ln(sigma(u_pos - u_neg))) minimizing distance between item and positive/negative users.
Purpose: Optimize the LLM to predict the correct users for a given item context.
Formally: Log-softmax loss L = -sum(y_u * log(p(u|c_i))) over the user set.
Training Data:
Historical interactions used to construct positive/negative user pairs for initialization
Item content texts paired with interacting users for SFT
Compute: Not reported in the paper
Comparison to Prior Work
vs. ColdLLM: FilterLLM predicts distributions in one pass (O(1) w.r.t users) vs. ColdLLM's iterative judgment (O(N) or pre-filtered)
vs. Wang et al.: FilterLLM models user distributions directly rather than just sampling item pairs
vs. TALLRec: FilterLLM uses ID-based tokens for users rather than textual descriptions, enabling scaling to billions [not cited in paper]
Limitations
Requires maintaining a massive user vocabulary which scales with the number of users
Re-training or incremental updates needed as new users join the platform (Cold-start User problem within the system)
Performance depends heavily on the quality of the collaborative initialization of user tokens
Reproducibility
No replication artifacts mentioned in the paper. Code URL is not provided. Dataset names and specific hyperparameters are not included in the provided text.
📊 Experiments & Results
Evaluation Setup
Cold-start item recommendation where items have no history but content is available