LKPNR: LLM and KG for Personalized News Recommendation Framework

📝 Paper Summary

News Recommendation Knowledge Graphs (KG) Large Language Models (LLM)

LKPNR improves news recommendation by combining traditional encoders with LLMs for deep semantic understanding and Knowledge Graphs for collaborative entity connections, addressing sparse data issues for inactive users.

Core Problem

Traditional news recommendation models struggle with complex semantic understanding of news text and fail to effectively recommend for inactive users (the 'long tail problem') due to insufficient historical data.

Why it matters:

Existing methods rely heavily on rich historical behaviors, leaving inactive users with poor recommendations
Traditional text encoders (CNN/LSTM) often miss complex semantic nuances and external knowledge connections in news articles
The 'long tail problem' means a vast majority of less popular news items are rarely recommended, reducing diversity and system effectiveness

Concrete Example: A user clicks a few news items about specific entities (e.g., 'Warriors'). A traditional model might fail to recommend a relevant but less popular article about 'D'Angelo Russell' if the user hasn't clicked it before. LKPNR uses the KG to link 'Warriors' to 'D'Angelo Russell' (a team member) and the LLM to understand the trade context, surfacing the relevant article even without direct click history.

Key Novelty

LLM and KG Augmented Personalized News Recommendation (LKPNR)

Augments standard news encoders by running news text through an LLM to extract deep semantic representations (hidden states)
Constructs a Knowledge Graph subgraph for entities in the news, using multi-hop neighbors to capture latent connections between seemingly unrelated news items
Fuses three distinct representations: the general encoder's output, the LLM's semantic vector, and the KG's structural entity vector

Architecture

The complete LKPNR framework. Bottom left shows input news. Center shows the three encoders (General, KG, LLM) producing vectors r_GNE, r_KG, r_LLM. Right side shows user history encoding and click prediction.

Evaluation Highlights

+2.47% AUC improvement over the NRMS baseline on the MIND dataset
+2.25% nDCG@5 improvement over the NRMS baseline
ChatGLM2-6B outperforms larger models like LLAMA2-13B in this framework, likely due to better alignment with the data distribution

Breakthrough Assessment

6/10

Solid integration of two trending technologies (LLM + KG) into established baselines with clear empirical gains. While the architecture is a logical extension rather than a paradigm shift, it effectively addresses the specific problem of semantic sparsity.

⚙️ Technical Details

Problem Definition

Setting: Personalized News Recommendation: predicting click probability given user history and candidate news

Inputs: User's click history H (sequence of news) and a candidate news article n

Outputs: Click probability score (matching score between user representation and news representation)

Pipeline Flow

Input Processing: Extract title, abstract, and entities from news
Parallel Encoding: Process news through three parallel encoders (General, LLM-Augmented, KG-Augmented)
Fusion: Concatenate representations to form final news vector
User Encoding: Aggregate history of news vectors into user vector
Prediction: Compute dot product between user and candidate news vectors

System Modules

General News Encoder (News Encoding)

Learn basic word-level semantic representations using traditional methods (e.g., Attention, CNN)

Model or implementation: Based on NRMS or NAML architecture

LLM-Augmented Encoder (News Encoding)

Extract deep semantic features using a pre-trained LLM

Model or implementation: ChatGLM2-6B (default), LLAMA2-13B, or RWKV-7B

KG-Augmented Encoder (News Encoding)

Capture structural entity relationships via multi-hop graph traversal

Model or implementation: Custom attention-based graph aggregator

LK-Aug User Encoder

Aggregate sequence of browsed news representations into a user profile

Model or implementation: Attention-based sequence aggregator (from NRMS/NAML)

Novel Architectural Elements

Triple-path news encoding: fusing General, LLM, and KG representations
LLM-to-KG bridge: using the LLM's text representation to generate the 'query' vector for the KG attention mechanism, rather than the general encoder's output

Modeling

Base Model: Integrates with NRMS or NAML as the 'General' backbone; uses ChatGLM2-6B, LLAMA2-13B, or RWKV-7B as the LLM component

Training Method: Supervised learning on click logs (negative sampling)

Objective Functions:

Purpose: Maximize likelihood of clicked news over non-clicked news.

Formally: Negative Log Likelihood Loss L = - sum(log(p_i)) over positive samples.

Training Data:

MIND dataset sampled to 200K users
Train/Validation split on user logs

Key Hyperparameters:

learning_rate: 1e-4
batch_size: 64
negative_sampling_ratio: 4
+ 5 more
max_user_history_length: 50
llm_projection_dim: 500
entity_embedding_dim: 100
kg_max_neighbors: 20
kg_hops: 2

Compute: NVIDIA TESLA V100

Comparison to Prior Work

vs. NRMS/NAML: LKPNR adds LLM and KG paths to the standard architecture
vs. DKN: LKPNR uses LLM representations to guide KG attention (via the query vector) rather than just CNN features
vs. Chat-REC: LKPNR uses LLM as a feature encoder (generating embeddings) rather than for in-context learning/generation
+ 1 more
vs. Liu et al. (2023) [Generative News Rec]: LKPNR focuses on representation learning fusion rather than generating user profiles or data augmentation [not cited in paper]

Limitations

Computational cost of running LLM inference for every news item is high (though caching is possible)
Dependency on external Knowledge Graph quality and coverage
Performance varies significantly with choice of LLM (ChatGLM2 > LLAMA2 > RWKV in their tests)

Reproducibility

Code: https://github.com/Xuan-ZW/LKPNR

Code is publicly available on GitHub. MIND dataset is public. Specific hyperparameters (LR, batch size) are provided. KG source (Wiki KG) is implied but specific version/dump not detailed.

📊 Experiments & Results

Evaluation Setup

Offline evaluation on historical click logs

Benchmarks:

MIND (Microsoft News Dataset) (News Recommendation)

Metrics:

AUC (Area Under ROC)
MRR (Mean Reciprocal Rank)
nDCG@5
nDCG@10
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
MIND (Sampled)	AUC	0.6802	0.7049	+0.0247
MIND (Sampled)	nDCG@5	0.3661	0.3886	+0.0225
MIND (Sampled)	AUC	0.6845	0.7023	+0.0178
MIND (Sampled)	AUC	0.7049	0.6997	-0.0052
MIND (Sampled)	AUC	0.7049	0.6842	-0.0207

Experiment Figures

Comparison of query strategies for the KG attention mechanism (General Encoder Query vs. LLM Query) across different numbers of attention heads

Visualization of attention weights on KG neighbors for a specific case study

Main Takeaways

Integrating both LLM and KG significantly outperforms traditional baselines (NRMS, NAML)
LLM contribution is more substantial than KG contribution (larger drop when removed), but both are additive
Using the LLM representation to query the KG works better than using the general encoder representation, suggesting LLMs capture better semantic 'keys' for knowledge retrieval
ChatGLM2-6B performed best among LLMs tested, likely due to bilingual (Chinese/English) training data covering more news context

📚 Prerequisite Knowledge

Prerequisites

Neural News Recommendation (NRMS/NAML architectures)
Knowledge Graphs and multi-hop reasoning
Large Language Models (LLMs) and hidden state extraction
Attention mechanisms

Key Terms

MIND: A large-scale dataset for news recommendation constructed from MSN news logs

Long tail problem: The phenomenon where a small number of popular items get most attention, while the vast majority of items (the tail) are rarely recommended

Knowledge Graph (KG): A structured representation of knowledge using entities (nodes) and relations (edges), used here to link news entities

Hop: A step in a graph traversal; 1-hop neighbors are directly connected, 2-hop are connected via one intermediary

AUC: Area Under the ROC Curve—a metric measuring the model's ability to distinguish between positive (clicked) and negative (non-clicked) samples

MRR: Mean Reciprocal Rank—a metric evaluating how high the first relevant item appears in the recommendation list

nDCG: Normalized Discounted Cumulative Gain—a ranking metric that gives more credit to relevant items appearing higher in the list