News Recommendation with Category Description by a Large Language Model

📝 Paper Summary

News Recommendation LLM Data Augmentation

This paper enhances neural news recommendation models by using an LLM to generate detailed textual descriptions for news categories, which are then concatenated with article titles during training and inference.

Core Problem

Standard news recommendation models typically use news categories via generic templates (e.g., 'The news category is {category}'), which fail to capture the semantic richness and specific context of diverse topics.

Why it matters:

Pre-trained language models (PLMs) used in news encoders often lack the specific knowledge to interpret obscure or domain-specific category names (e.g., 'tv-golden-globes') without context.
Manual creation of detailed descriptions for hundreds of news categories is costly and unscalable.
Insufficient category representation limits the ability of recommendation models to accurately match user interests with news content.

Concrete Example: For the category 'tv-golden-globes', a standard model might just see the string or a template 'The news category is tv-golden-globes'. This misses context about the 'television industry', 'nominations', and 'awards', which are explicit in the LLM-generated description.

Key Novelty

LLM-Augmented Category Descriptions

Leverage an LLM (GPT-4) to automatically generate informative, paragraph-length descriptions for all news categories in the dataset.
Integrate these descriptions into existing neural news recommendation architectures by concatenating them with the news title before feeding into the news encoder.

Architecture

The workflow of the proposed method, illustrating the generation of descriptions and their integration into the news encoder.

Evaluation Highlights

Achieved up to 5.8% improvement in AUC compared to template-based baselines when applied to state-of-the-art models (NAML, NRMS, NPA).
Consistently outperformed 'title only' and 'title + template-based' baselines across multiple metrics (AUC, MRR, nDCG) on the MIND dataset.
Demonstrated effectiveness across different PLM backbones (BERT-base, DistilBERT-base).

Breakthrough Assessment

4/10

A straightforward but effective application of LLMs for data augmentation in recommender systems. While not architecturally novel, it provides a practical method for enriching semantic features.

⚙️ Technical Details

Problem Definition

Setting: News recommendation where the goal is to predict the probability of a user clicking a candidate news article based on their history.

Inputs: User browsing history (list of clicked news articles) and a candidate news article.

Outputs: Click probability score.

Pipeline Flow

Category Description Generation (GPT-4)
Text Concatenation (Title + Description)
News Encoder (PLM)
User Encoder (Aggregation of News Vectors)
Click Probability Prediction

System Modules

Category Description Generator

Generates detailed textual explanations for category labels

Model or implementation: GPT-4

News Encoder (Inference / Recommendation)

Encodes news content into a vector representation

Model or implementation: BERT-base or DistilBERT-base

User Encoder (Inference / Recommendation)

Aggregates clicked news vectors into a user profile vector

Model or implementation: Model-specific (Attentive aggregation for NAML/NPA, Multi-head self-attention for NRMS)

Similarity Calculator (Inference / Recommendation)

Computes score between candidate news and user vector

Model or implementation: Dot product

Novel Architectural Elements

Integration of LLM-generated auxiliary text (category descriptions) directly into the PLM input sequence of standard news encoders via concatenation with the [SEP] token.

Modeling

Base Model: BERT-base and DistilBERT-base

Training Method: Supervised learning on click prediction task

Trainable Parameters: News encoder (PLM) and User encoder weights

Training Data:

MIND dataset
Negative sampling: 4 negative samples per positive sample

Key Hyperparameters:

learning_rate: 1e-4
batch_size: 128
epochs: 3
+ 2 more
optimizer: AdamW
max_history_length: 50

Compute: Tesla V100 GPU

Comparison to Prior Work

vs. Prompt4NR: Uses generated descriptions as feature input rather than using prompts to guide the PLM's prediction task directly.
vs. Standard NAML/NRMS: Augments input text with external knowledge (LLM descriptions) rather than relying solely on title/body/category-label.
vs. Karimi et al. (2023) [not cited in paper]: Similar to methods enriching short text with external knowledge graphs, but uses LLM generation instead.

Limitations

GPT-4 occasionally generates inaccurate descriptions for broad or ambiguous categories (e.g., 'tunedin' described as purely entertainment when it includes tech/business).
Reliance on paid/closed-source LLM (GPT-4) for data generation.
Input length to the news encoder increases, potentially increasing training/inference cost (though not explicitly analyzed).

Reproducibility

Code: https://github.com/yamanalab/gpt-augmented-news-recommendation

Code is publicly available on GitHub. Prompts for GPT-4 are provided in Figure 3. Uses standard MIND dataset and HuggingFace Transformers (BERT/DistilBERT). GPT-4 API required for data generation step.

📊 Experiments & Results

Evaluation Setup

Offline evaluation on historical click logs

Benchmarks:

MIND (News Recommendation)

Metrics:

AUC
MRR
nDCG@5
nDCG@10
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Performance comparison using BERT-base as the news encoder backbone across three different recommendation architectures (NAML, NRMS, NPA).
MIND	AUC	0.6698	0.6874	+0.0176
MIND	AUC	0.6710	0.6874	+0.0164
MIND	AUC	0.6385	0.6756	+0.0371
Performance comparison using DistilBERT-base as the news encoder backbone.
MIND	AUC	0.6341	0.6700	+0.0359

Main Takeaways

The proposed method consistently outperforms baselines across all three recommendation models (NAML, NRMS, NPA) and two backbones (BERT, DistilBERT).
Simple template-based category injection ('The news category is...') often yields negligible improvement over using the title alone, suggesting PLMs need more semantic context.
Improvements are significant, reaching up to roughly 5.8% in AUC, validating that detailed descriptions help the encoder understand category semantics better.

📚 Prerequisite Knowledge

Prerequisites

Basics of neural news recommendation (News Encoder, User Encoder)
Transformer-based language models (BERT)
Content-based filtering concepts

Key Terms

MIND: Microsoft News Dataset—a large-scale dataset for news recommendation research

NAML: Neural News Recommendation with Attentive Multi-View Learning—a model using attention mechanisms to learn representations from different news views (title, body, category)

NRMS: Neural News Recommendation with Multi-Head Self-Attention—a model employing multi-head self-attention to learn user and news representations

NPA: Neural News Recommendation with Personalized Attention—a model utilizing user ID embeddings to personalize attention mechanisms

PLM: Pre-trained Language Model—models like BERT trained on vast text corpora, used here to encode news text

AUC: Area Under the ROC Curve—a performance metric evaluating the model's ability to distinguish between positive (clicked) and negative (non-clicked) samples

MRR: Mean Reciprocal Rank—a metric evaluating the ranking quality, prioritizing correct items appearing higher in the list

nDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that takes into account the position of relevant items

SEP token: A special token used in BERT-style models to separate two different segments of text input