LLM for RecommendationKnowledge AugmentationClick-Through Rate (CTR) Prediction
KAR augments recommender systems by using large language models to generate user preference reasoning and item factual knowledge, which are then compressed by a hybrid-expert adaptor for efficient deployment.
Core Problem
Classical recommender systems are 'insulated' within closed domains, lacking external world knowledge, while direct use of LLMs suffers from high latency and a 'compositional gap' where they fail at the specific task of ranking items.
Why it matters:
Closed systems miss contextual clues (e.g., seasonal preferences or external events) that are obvious to humans but absent in ID-based data.
Directly deploying LLMs in industrial systems is impractical due to strict latency requirements (usually <100ms) and cost.
LLMs struggle with the specific 'compositional' task of recommendation despite understanding the sub-problems, leading to suboptimal accuracy compared to specialized models.
Concrete Example:A user might watch holiday movies during Christmas. A classical ID-based model only sees a behavior pattern, but an LLM can explicitly reason 'User is interested in holiday themes due to the season.' Current systems miss this explicit reasoning.
Factorization Prompting: Breaks the recommendation problem into generating 'reasoning knowledge' (user preferences) and 'factual knowledge' (item details) separately to bypass the LLM's compositional weakness.
Hybrid-Expert Adaptor: A specialized neural module that transforms verbose, high-dimensional LLM outputs into compact dense vectors compatible with traditional recommenders, filtering noise via mixture-of-experts.
Pre-storage Strategy: Decouples LLM generation from real-time inference by generating and caching knowledge offline, eliminating inference latency.
Architecture
The transition from closed-world systems (learning only from domain data) to open-world systems (acquiring reasoning/factual knowledge from LLMs), and the KAR pipeline transforming this knowledge into vectors.
Evaluation Highlights
+7% improvement in online A/B testing on Huawei's news recommendation platform compared to the production baseline.
+1.7% improvement in online A/B testing on Huawei's music recommendation platform compared to the production baseline.
Significantly outperforms state-of-the-art baselines on public datasets (results described qualitatively in text as numeric tables were not in the provided snippet).
Breakthrough Assessment
8/10
Achieves a rare successful deployment of LLM-augmented recommendation in a large-scale industrial setting (Huawei) with significant online metrics, solving the critical latency bottleneck.
⚙️ Technical Details
Problem Definition
Setting: Binary classification (Click-Through Rate prediction) over multi-field categorical data augmented with external knowledge.
Inputs: User/Item categorical features x (sparse one-hot vectors) and augmented knowledge vectors from LLMs.
Generate reasoning knowledge (user preferences) and factual knowledge (item details) via factorization prompting.
Model or implementation: ChatGPT (gpt-3.5-turbo) or LLaMA
Hybrid-Expert Adaptor
Transform and condense generated text embeddings into augmented vectors compatible with the RS, filtering noise.
Model or implementation: Mixture-of-Experts (MoE) style MLP
Recommendation Model
Predict user engagement by combining original features with augmented vectors.
Model or implementation: Model-agnostic (e.g., DeepFM, DCN)
Novel Architectural Elements
Hybrid-expert adaptor designed specifically to bridge the semantic space of LLMs and the collaborative space of RS.
Factorization prompting workflow that structurally separates user reasoning from item facts before ingestion.
Modeling
Base Model: Model-agnostic framework (tested with DCN, DeepFM, etc.); LLM backbone is ChatGPT or LLaMA.
Training Method: Joint training of the Hybrid-Expert Adaptor and the downstream Recommendation Model.
Objective Functions:
Purpose: Minimize prediction error for binary classification (click vs no-click).
Formally: Log-loss (Binary Cross Entropy).
Compute: Inference latency is minimized by pre-storing LLM knowledge; specific training compute not reported in text.
Comparison to Prior Work
vs. ChatRec: KAR does not use the LLM for direct ranking/scoring, avoiding latency and accuracy issues; instead, it extracts knowledge to augment a classical model.
vs. TALLRec: KAR focuses on augmenting classical models with open-world knowledge rather than finetuning the LLM itself to be the recommender.
vs. UniSRec: KAR utilizes large LLMs (reasoning capabilities) rather than smaller PLMs like BERT (semantic encoding only), and generates new reasoning text rather than just encoding existing text.
Limitations
Dependency on the quality of the underlying LLM; hallucinations could introduce noise (addressed via adaptor).
Offline generation of knowledge might miss real-time dynamic changes in user interest if not refreshed frequently.
Requires storage overhead for pre-generated knowledge vectors.
Code and generated textual knowledge are publicly available at https://github.com/YunjiaXi/Open-World-Knowledge-Augmented-Recommendation. The paper explicitly mentions releasing these to facilitate future research.
📊 Experiments & Results
Evaluation Setup
Offline evaluation on public datasets and Online A/B testing in industrial environment.
Huawei Music Platform (Industrial Music Recommendation) [New]
Metrics:
Online Improvement (%)
AUC (Offline)
LogLoss (Offline)
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
Online A/B testing results demonstrate significant real-world impact when deploying KAR in large-scale industrial systems.
Main Takeaways
KAR achieves a 7% improvement in online A/B testing for Huawei's news recommendation platform.
KAR achieves a 1.7% improvement in online A/B testing for Huawei's music recommendation platform.
The framework effectively bridges the gap between LLM reasoning capabilities and classical recommender efficiency by decoupling generation (offline) from utilization (online).
Factorization prompting successfully mitigates the compositional gap, allowing the system to leverage specific reasoning about user preferences.
📚 Prerequisite Knowledge
Prerequisites
Collaborative Filtering
Large Language Models (LLMs)
Mixture of Experts (MoE)
Click-Through Rate (CTR) Prediction
Key Terms
Factorization Prompting: A prompting strategy that breaks a complex recommendation task into sub-tasks (preference reasoning and factual knowledge generation) to avoid the compositional gap.
Compositional Gap: The phenomenon where models (like LLMs) can solve sub-problems correctly but fail to solve the complex composite problem (like recommending an item).
Hybrid-Expert Adaptor: A module in KAR using multiple expert networks to transform and compress open-world knowledge into dense vectors suitable for recommendation.
Open-world Knowledge: Information outside the closed training loop of a recommender system, specifically reasoning about user motives and factual details about items.
Insulated Nature: The characteristic of classical recommender systems being trained only on domain-specific logs without access to broader world knowledge.
Hallucination: The tendency of LLMs to generate plausible but factually incorrect information.