ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation

📝 Paper Summary

LLM-based Recommendation Click-Through Rate (CTR) Prediction

ELCoRec improves LLM-based recommendation by encoding numerical and categorical features via a Graph Attention Network into a soft prompt and using a hybrid template that combines retrieved and recent interactions.

Core Problem

LLMs struggle with recommendation tasks due to 'numerical insensitivity' (treating ratings/timestamps as plain text) and 'encoding overhead' (context windows cannot fit full history or extensive side features).

Why it matters:

LLMs miss the quantitative nuance of user ratings and temporal intervals when processed as text tokens
Retrieval-based methods (like ReLLa) filter long histories but break the continuous time-series sequence, losing trend information
Existing solutions either ignore side features due to token limits or introduce excessive computational costs by encoding every historical item

Concrete Example: An LLM reading 'Rating: 4' treats '4' as a text token similar to 'A', failing to grasp its magnitude relative to '5'. Similarly, retrieving only semantically relevant movies (e.g., 'Sci-Fi') might exclude a user's recent shift toward 'Romance' movies, which a sequential history would capture.

Key Novelty

ELCoRec (Enhance Language understanding with Co-Propagation for Recommendation)

Offloads the processing of numerical (ratings, time) and categorical features to a dedicated Graph Attention Network (GAT) expert model instead of forcing the LLM to parse them as text
Injects the GAT-derived user preference embedding into the LLM as a single 'soft token', bypassing context window limits
Uses a 'Recent interaction Augmented Prompt' (RAP) that stitches together semantically retrieved items (for global interest) and strictly recent items (for local trends) to fix sequential breaks

Architecture

The overall architecture of ELCoRec, illustrating the parallel processing of the RAP textual template and the GAT-based graph encoding, which merge at the LLM input.

Evaluation Highlights

Achieves highest AUC of 0.9254 on MovieLens-1M, surpassing the best LLM baseline (ReLLa) by +0.0119
Outperforms strong non-LLM baselines (e.g., DIN, SASRec) and LLM baselines (TALLRec, CoLLM) across three datasets (MovieLens, Amazon Books, Electronics)
Ablation studies confirm the RAP template contributes significantly, improving AUC by roughly 0.003-0.005 compared to retrieval-only prompts

Breakthrough Assessment

7/10

Solid engineering combination of Graph Neural Networks and LLMs for recommendation. Effectively addresses the specific limitations of text-only LLM recommenders (context length and numerical reasoning) with a practical soft-prompting approach.

⚙️ Technical Details

Problem Definition

Setting: Click-Through Rate (CTR) prediction where the goal is to predict binary user preference y (0 or 1) for a target item given user history and side features.

Inputs: User history sequence, item features (categorical and numerical), and target item.

Outputs: Probability of user clicking the target item (mapped from 'Yes'/'No' token probabilities).

Pipeline Flow

Input Processing: Construct RAP text template (Retriever + Recent) & Build Graph (GAT)
GAT Encoding: Propagate features in graph to get user embedding
Injection: Project user embedding to LLM space (Soft Token)
Inference: LLM processes Text Prompt + Soft Token to predict 'Yes'/'No'

System Modules

RAP Constructor

Create text prompt combining semantically relevant items and recent items

Model or implementation: Retrieval via dense embeddings (e.g., from an LLM encoder)

GAT Expert

Encode numerical ratings, timestamps, and categorical side info into a user representation

Model or implementation: 2-layer Graph Attention Network

Projector

Map GAT embedding to LLM token space

Model or implementation: Linear Layer (MLP)

LLM Backbone

Generate prediction based on text context and injected soft token

Model or implementation: Llama-2-7b-chat (with LoRA)

Novel Architectural Elements

Parallel co-propagation of numerical/categorical features via GAT fed into LLM as a single soft token
RAP mechanism explicitly fusing retrieval (semantic) and recent (temporal) sequences in the text prompt

Modeling

Base Model: Llama-2-7b-chat

Training Method: Supervised Instruction Tuning with LoRA

Objective Functions:

Purpose: Optimize CTR prediction accuracy.

Formally: Standard Cross-Entropy Loss (LogLoss) over binary labels (click/no-click).

Adaptation: LoRA (Low-Rank Adaptation) applied to Query and Value matrices

Key Hyperparameters:

learning_rate_llm: 2e-4
learning_rate_gat: 1e-3
batch_size: 32
+ 4 more
lora_r: 8
lora_alpha: 16
max_token_length: 1024
GAT_layers: 2

Compute: Single NVIDIA A800 GPU (80GB)

Comparison to Prior Work

vs. TALLRec: ELCoRec adds side information and long history via GAT and RAP, whereas TALLRec relies solely on limited text history.
vs. ReLLa: ReLLa breaks sequential patterns by only retrieving relevant items; ELCoRec's RAP template restores recent history to capture temporal trends.
vs. CoLLM: CoLLM uses external ID embeddings; ELCoRec explicitly models the heterogeneous graph (numerical + categorical) via GAT for the injection.

Limitations

Inference latency is higher than traditional lightweight CTR models (e.g., SASRec) due to LLM decoding.
Requires maintaining and updating a graph structure for the GAT expert.
Context window limits still apply to the textual part of the prompt, necessitating the RAP heuristic.

Reproducibility

Code: https://anonymous.4open.science/r/CIKM_Code_Repo-E6F5/README.md

Code is available at https://anonymous.4open.science/r/CIKM_Code_Repo-E6F5/README.md. The paper specifies Llama-2-7b-chat as the backbone. Datasets (MovieLens, Amazon) are public standard benchmarks.

📊 Experiments & Results

Evaluation Setup

Click-Through Rate (CTR) prediction on three standard datasets.

Benchmarks:

MovieLens-1M (Movie Recommendation (CTR))
Amazon Books (Book Recommendation (CTR))
Amazon Electronics (Product Recommendation (CTR))

Metrics:

AUC (Area Under ROC Curve)
LogLoss
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Main comparison results showing ELCoRec outperforms both traditional deep learning models and LLM-based baselines across all datasets.
MovieLens-1M	AUC	0.9135	0.9254	+0.0119
Amazon Books	AUC	0.8752	0.8841	+0.0089
Amazon Electronics	AUC	0.8912	0.8986	+0.0074
MovieLens-1M	LogLoss	0.3346	0.2798	-0.0548
Ablation study demonstrating the contribution of the Recent interaction Augmented Prompt (RAP) and the GAT expert injection.
MovieLens-1M	AUC	0.9204	0.9254	+0.0050
MovieLens-1M	AUC	0.9152	0.9254	+0.0102

Experiment Figures

Impact of different sequence lengths and retrieved item counts on model performance (AUC).

Main Takeaways

ELCoRec consistently outperforms state-of-the-art methods (ReLLa, TALLRec) across datasets with varying sparsity (dense MovieLens vs sparse Amazon).
The GAT expert effectively captures numerical and categorical information that the LLM misses when processing plain text, as proven by ablation studies.
The RAP template balances global user interests (via semantic retrieval) and immediate temporal trends (via recent items), solving the 'broken sequence' problem of pure retrieval methods.
Soft prompting proves to be an efficient mechanism to bridge the gap between structured graph data and unstructured language models.

📚 Prerequisite Knowledge

Prerequisites

Basics of Recommender Systems (CTR, user history)
Large Language Models (Prompting, LoRA)
Graph Neural Networks (GAT)

Key Terms

GAT: Graph Attention Network—a neural network that processes data represented as graphs (nodes and edges) using attention mechanisms to weigh neighbor importance

Soft Prompting: Injecting learnable continuous vectors (embeddings) directly into the LLM's input sequence rather than using discrete text tokens

LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that updates only a small subset of model parameters

CTR: Click-Through Rate—the metric measuring the ratio of users who click on a specific link to the number of total users who view a page

RAP: Recent interaction Augmented Prompt—a prompt template proposed in this paper that includes both semantically retrieved items and the most recent items from user history

Collaborative Signals: Patterns learned from the interactions of many users and items (e.g., users who liked X also liked Y), often captured by graph models