Efficient and Deployable Knowledge Infusion for Open-World Recommendations via Large Language Models

📝 Paper Summary

LLM-Augmented Recommendation Open-World Knowledge Integration Industrial Recommender Systems

REKI enhances industrial recommender systems by efficiently extracting open-world knowledge about users and items from LLMs via factorization prompting and integrating it as compact vectors, avoiding the high latency of direct LLM inference.

Core Problem

Directly using LLMs as recommenders in industrial settings is impractical due to high inference latency and massive resource consumption, while traditional models lack access to open-world knowledge and complex reasoning capabilities.

Why it matters:

Closed-loop recommenders are isolated from external world knowledge, leading to outdated or imprecise recommendations.
LLMs offer reasoning and factual knowledge but are too slow and costly for real-time inference with billions of users and items.
Existing methods that fine-tune LLMs struggle with the scale and dynamic nature of industrial data.

Concrete Example: In movie recommendation, a traditional model might miss that a user prefers 'holiday-themed movies' during Christmas because it only sees ID interactions. An LLM could reason this preference, but running an LLM for every user request in real-time is too slow (latency > 100ms).

Key Novelty

Recommendation with Efficient Knowledge Infusion (REKI)

Uses 'factorization prompting' to break down complex user preference reasoning into smaller sub-problems, mitigating the compositional gap of LLMs.
Introduces 'collective knowledge extraction' to cluster users/items and generate knowledge for groups rather than individuals, drastically reducing offline compute for large-scale systems.
employs a 'Hybridized Expert-Integrated Network (HEIN)' to compress textual knowledge into dense vectors, making it compatible with any conventional recommendation model.

Architecture

The overall framework of REKI, illustrating the two-stage process: Knowledge Extraction (via LLM) and Knowledge Integration (via HEIN into CRM).

Evaluation Highlights

Achieved a 7% improvement in online A/B testing on Huawei's news recommendation platform.
Achieved a 1.99% improvement in online A/B testing on Huawei's music recommendation platform.
Outperforms state-of-the-art baselines on public datasets (Amazon Beauty, Sports, Toys) with improvements in NDCG@10 and Recall@10.

Breakthrough Assessment

8/10

Offers a highly practical, deployment-oriented solution that successfully bridges the gap between LLM capabilities and industrial constraints. The collective extraction strategy is a smart fix for scalability.

⚙️ Technical Details

Problem Definition

Setting: Top-N Recommendation with Open-World Knowledge Augmentation

Inputs: User interaction history H_u and Item set I

Outputs: Top-N ranked list of items for user u

Pipeline Flow

Knowledge Extraction (Offline): LLM generates textual knowledge for users/items via Factorization Prompting
Knowledge Integration (Offline/Training): Text is encoded and compressed into dense vectors via HEIN
Recommendation (Online): Augmented vectors are combined with ID features in a CRM for final scoring

System Modules

Knowledge Generator

Generate reasoning about user preferences and factual details about items

Model or implementation: LLMs (e.g., ChatGPT, LLaMA-2-7B)

Text Encoder (Knowledge Integration)

Convert generated text into high-dimensional embeddings

Model or implementation: Sentence-BERT (all-MiniLM-L6-v2)

HEIN (Hybridized Expert-Integrated Network) (Knowledge Integration)

Compress and adapt high-dim text embeddings into low-dim recommendation vectors

Model or implementation: Mixture of Experts (MoE) MLP layers

Recommender Backbone

Predict user-item scores using both ID features and augmented knowledge

Model or implementation: Any CRM (e.g., DIN, SASRec, LightGCN)

Novel Architectural Elements

Separation of LLM inference (offline knowledge generation) from online recommendation flow
Collective Knowledge Extraction pipeline: Clustering -> Prototype Generation -> LLM Inference -> Knowledge Broadcasting
HEIN module specifically designed to adapt semantic embeddings to collaborative filtering spaces

Modeling

Base Model: Varies by experiment (CRM backbones: DIN, SASRec, LightGCN; LLM: ChatGPT/LLaMA-2)

Training Method: Standard supervised learning (e.g., BPR loss or Log loss depending on backbone)

Objective Functions:

Purpose: Minimize recommendation error.

Formally: Standard loss function of the base CRM (e.g., Binary Cross Entropy for DIN, BPR for LightGCN)

Adaptation: HEIN module is trained from scratch; Text Encoder is frozen

Trainable Parameters: HEIN weights + CRM weights

Training Data:

Amazon datasets (Beauty, Sports, Toys)
Online datasets (Huawei News, Music)

Key Hyperparameters:

learning_rate: 0.001
batch_size: 2048 or 4096
embedding_dimension: 64
+ 1 more
HEIN_experts: Not explicitly detailed in text but implies MoE structure

Compute: Inference latency comparable to base CRM (e.g., ~10ms); LLM inference is one-time offline cost

Comparison to Prior Work

vs. TALLRec/ChatRec: REKI does NOT fine-tune the LLM nor use it for online inference; it extracts knowledge offline to augment CRMs.
vs. KAR: REKI introduces factorization prompting and collective extraction for better reasoning and scalability.
vs. Standard CRMs (SASRec, DIN): REKI adds an explicit open-world knowledge channel.

Limitations

Dependency on the quality of the LLM used for extraction (garbage in, garbage out).
Offline extraction means knowledge might become stale if user preferences shift rapidly before next update.
Collective extraction trades off personalization granularity for efficiency.

Reproducibility

Code: https://github.com/YunjiaXi/REKI

Code and generated knowledge are publicly available at https://github.com/YunjiaXi/REKI. The exact prompts for all datasets are provided. Proprietary industrial datasets are not released.

📊 Experiments & Results

Evaluation Setup

Sequential and Top-N Recommendation

Benchmarks:

Amazon Beauty (Sequential Recommendation)
Amazon Sports (Sequential Recommendation)
Amazon Toys (Sequential Recommendation)

Metrics:

NDCG@10
Recall@10
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Performance on public Amazon datasets with SASRec as backbone. REKI consistently improves over the baseline.
Amazon Beauty	NDCG@10	0.4132	0.4357	+0.0225
Amazon Sports	NDCG@10	0.3644	0.3802	+0.0158
Amazon Toys	NDCG@10	0.3821	0.4013	+0.0192
Online A/B test results from Huawei platforms demonstrate real-world efficacy.
Huawei News Platform	Improvement	0	7.00	+7.00%
Huawei Music Platform	Improvement	0	1.99	+1.99%

Main Takeaways

REKI is model-agnostic and improves performance across different backbones (SASRec, DIN, LightGCN).
Factorization prompting is crucial; it helps LLMs reason about user preferences more accurately than direct prompting.
Collective knowledge extraction maintains high performance while significantly reducing computational costs, making the system deployable at scale.

📚 Prerequisite Knowledge

Prerequisites

Collaborative Filtering (CF)
Large Language Models (LLMs)
Mixture of Experts (MoE)
Prompt Engineering

Key Terms

CRM: Conventional Recommendation Model—traditional deep learning models like DIN or SASRec that rely on ID-based interaction data

Factorization Prompting: A technique to decompose a complex reasoning task (e.g., recommend items) into simpler sub-tasks (e.g., analyze user history, infer preferences) to help LLMs reason better

Compositional Gap: The inability of LLMs to solve a complex problem even if they can solve all its sub-problems individually

HEIN: Hybridized Expert-Integrated Network—a module in REKI that compresses LLM-generated text embeddings into compact vectors using multiple expert networks

Collective Knowledge Extraction: A strategy for large-scale systems where users/items are clustered, and the LLM generates knowledge for the cluster centroid rather than each individual, saving compute