Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

📝 Paper Summary

LLM-based Recommender Systems Sequential Recommendation Long-sequence Modeling

ReLLaX addresses the inability of LLMs to comprehend long user behavior sequences in recommendation by optimizing data retrieval, injecting collaborative soft prompts, and enabling fully interactive parameter adaptation.

Core Problem

LLMs suffer from 'lifelong sequential behavior incomprehension,' where they fail to extract useful information from long user behavior text sequences (e.g., >15 items) even when the text fits within context limits.

Why it matters:

Contrary to NLP tasks where LLMs handle long contexts well, their performance in recommendation peaks at short sequences and degrades as history length increases.
Traditional truncation (keeping only recent items) discards valuable long-term user preference signals necessary for accurate prediction.
Existing adaptations like standard LoRA lack the expressive power to capture the complex dependencies in lifelong behavioral data.

Concrete Example: In a movie recommendation scenario, if a user watched 50 movies, a standard LLM's accuracy might peak after seeing the last 15 and drop if shown all 50. ReLLaX retrieves the most semantically relevant movies (e.g., 'Sci-Fi' genre matches) rather than just the most recent ones and formats them effectively for the LLM.

Key Novelty

Full-Stack Optimization (ReLLaX)

Data-level: Semantic User Behavior Retrieval (SUBR) selects relevant historical items based on semantic similarity to the target, reducing sequence heterogeneity compared to chronological truncation.
Prompt-level: Soft Prompt Augmentation (SPA) projects item embeddings from a conventional recommendation model into soft tokens, injecting collaborative filtering knowledge directly into the LLM input.
Parameter-level: Component Fully-interactive LoRA (CFLoRA) decomposes adaptation matrices into vectors and enables full interaction between them, guided by user history, to increase expressive power.

Architecture

The overall architecture of ReLLaX, illustrating the three optimization levels: Data (SUBR retrieval), Prompt (text + soft tokens), and Parameter (CFLoRA adaptation).

Evaluation Highlights

Achieves state-of-the-art performance on MovieLens-1M (AUC 0.9234), outperforming the best LLM baseline GLRec by +0.0049.
Demonstrates consistent gains on Amazon Books (AUC 0.8932) and Amazon Electronics (AUC 0.8972) datasets compared to strong baselines like CoLLM and TallRec.
Outperforms the conference version (ReLLa) by effectively utilizing Soft Prompt Augmentation and CFLoRA to handle longer sequences without performance degradation.

Breakthrough Assessment

7/10

Significant improvement in adapting LLMs for specific recommendation constraints (long history). Theoretically grounded modification of LoRA (CFLoRA) is a strong technical contribution beyond simple prompt engineering.

⚙️ Technical Details

Problem Definition

Setting: Click-Through Rate (CTR) Prediction with textual and behavioral history inputs.

Inputs: User profile, chronological interaction sequence H_u, and target item i_c (represented as both text and ID embeddings).

Outputs: Probability y_hat (scalar between 0 and 1) indicating likelihood of clicking the target item.

Pipeline Flow

Data Processing: Retrieval of relevant user behaviors (SUBR)
Prompt Construction: Textual hard prompts + ID-based soft prompts (SPA)
Inference: LLM pointwise scoring with CFLoRA adaptation

System Modules

Semantic User Behavior Retrieval (SUBR)

Selects top-K historical items semantically relevant to the target item to reduce noise and sequence length issues.

Model or implementation: Retrieval via pre-trained CRM embeddings

Soft Prompt Augmentation (SPA)

Injects collaborative knowledge by transforming item IDs into soft tokens.

Model or implementation: Lightweight Projector (Linear Layer)

LLM with CFLoRA

Generates 'Yes'/'No' probabilities for ranking.

Model or implementation: Llama-2-7b-chat optimized with CFLoRA

Novel Architectural Elements

CFLoRA: Replaces standard LoRA matrices A and B with fully interactive atom components (vectors) guided by user/item context.
Hybrid Prompting: Concatenates text-based history (Hard Prompt) with projected embedding-based history (Soft Prompt) in a single input sequence.

Modeling

Base Model: Llama-2-7b-chat

Training Method: Instruction Tuning with CFLoRA

Objective Functions:

Purpose: Optimize the probability of generating the correct label ('Yes' or 'No').

Formally: Causal Language Modeling loss on the output tokens.
Purpose: Estimate CTR during inference.

Formally: Softmax over the logits of 'Yes' and 'No' tokens.

Adaptation: CFLoRA (Component Fully-interactive LoRA)

Trainable Parameters: LoRA parameters (decomposed vectors) and SPA projector

Key Hyperparameters:

learning_rate: 1e-4
batch_size: 16
epochs: 2
+ 3 more
lora_rank: 8
context_length: 2048
retrieval_k: 10 (items)

Compute: 8 NVIDIA A800-80G GPUs

Comparison to Prior Work

vs. ReLLa: ReLLaX adds Soft Prompt Augmentation (SPA) and Component Fully-interactive LoRA (CFLoRA) for better expressiveness.
vs. CoLLM: ReLLaX uses a specialized retrieval mechanism (SUBR) and a more complex LoRA structure (CFLoRA) rather than standard soft prompting.
vs. LoRA+ / VeRA [not cited in paper]: The paper theoretically analyzes CFLoRA as a generalization of these methods, offering more flexible component interactions.

Limitations

The retrieval mechanism (SUBR) relies on the quality of a pre-trained conventional recommendation model (CRM).
Inference latency is higher than conventional models due to LLM decoding, though only a single forward pass is needed for scoring.
The method focuses on re-ranking/scoring and depends on an upstream candidate generation phase.

Reproducibility

Code: https://github.com/LaVieEnRose365/ReLLaX

Code is publicly available at https://github.com/LaVieEnRose365/ReLLaX. The paper uses public datasets (MovieLens-1M, Amazon Books, Amazon Electronics).

📊 Experiments & Results

Evaluation Setup

CTR prediction on held-out test sets using a leave-one-out strategy.

Benchmarks:

MovieLens-1M (Movie Recommendation)
Amazon Books (E-commerce Recommendation)
Amazon Electronics (E-commerce Recommendation)

Metrics:

AUC (Area Under Curve)
LogLoss
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
ReLLaX outperforms both Conventional Recommendation Models (CRMs) and LLM-based baselines across all three datasets.
MovieLens-1M	AUC	0.9185	0.9234	+0.0049
Amazon Books	AUC	0.8856	0.8932	+0.0076
Amazon Electronics	AUC	0.8916	0.8972	+0.0056
Ablation studies confirm the contribution of each module (SPA and CFLoRA) to the overall performance.
MovieLens-1M	AUC	0.9213	0.9234	+0.0021
MovieLens-1M	AUC	0.9208	0.9234	+0.0026

Experiment Figures

Performance (AUC) of various LLMs (Vicuna, Mistral, Llama-2) vs. sequence length compared to a conventional model (SIM).

Performance comparison of different LoRA variants (LoRA, LoRA+, DoRA, VeRA, etc.) against CFLoRA on MovieLens-1M.

Main Takeaways

Standard LLMs show a 'sawtooth' performance profile where accuracy degrades as sequence length increases beyond ~15 items; ReLLaX maintains or improves performance with longer retrieved sequences.
The 'Full-Stack' approach is necessary: Data retrieval (SUBR) alone improves over random sampling, but adding Prompt (SPA) and Parameter (CFLoRA) optimization yields cumulative gains.
CFLoRA is theoretically shown to be a generalization of existing LoRA variants, providing a flexible framework for injecting user context into the adaptation weights.

📚 Prerequisite Knowledge

Prerequisites

Parameter-Efficient Fine-Tuning (specifically LoRA)
Sequential Recommendation / CTR Prediction
Collaborative Filtering embeddings

Key Terms

LoRA: Low-Rank Adaptation—a technique to fine-tune LLMs by training small rank-decomposition matrices while keeping the main model frozen.

CTR: Click-Through Rate—the probability that a user will click on a recommended item.

Soft Prompts: Learnable vectors injected into the input sequence that don't correspond to fixed vocabulary words but guide the model's behavior.

Collaborative Knowledge: Information regarding user-item interactions and patterns derived from conventional recommendation models (like similar users liking similar items).

Lifelong Sequential Behavior Incomprehension: A phenomenon defined in this paper where LLMs fail to extract information from long behavior sequences even when within context limits.

SUBR: Semantic User Behavior Retrieval—selecting historical items relevant to the target item rather than just recent ones.

CFLoRA: Component Fully-interactive LoRA—a proposed LoRA variant allowing full interaction between decomposed vector components.

CRM: Conventional Recommendation Model—traditional deep learning models for recommendation (e.g., DIN, SIM) used here to provide embeddings.