SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation

📝 Paper Summary

Session-based Recommendation (SBR) LLM-based User Profiling

SPRINT enhances session-based recommendation by selectively invoking LLMs during training to populate a global intent pool, then training a lightweight predictor to infer these intents at inference time without LLM dependency.

Core Problem

Directly applying LLMs to session-based recommendation fails because short, anonymous sessions lack sufficient context for reliable profiling (leading to hallucinations) and per-session LLM inference is prohibitively slow.

Why it matters:

Real-world session data is voluminous and anonymous, making the computational cost of live LLM inference impractical for deployment
Short sessions (e.g., avg length ~3.6 on Yelp) provide too little signal for standard LLM profiling, resulting in vague or noisy descriptions that degrade recommendation accuracy
Existing intent modeling methods lack explicit validation mechanisms to ensure generated intents actually contribute to predictive performance

Concrete Example: A user browsing 'eco-friendly' and 'environmentally friendly' skincare might generate redundant, slightly different text profiles if processed independently by an LLM. Furthermore, on a short session with just 3 clicks, an LLM might hallucinate a 'summer necessity' intent that isn't grounded in the data, misleading the recommender.

Key Novelty

Scalable and Predictive Intent Refinement (SPRINT)

Constrains LLM generation to a shared 'Global Intent Pool' (GIP) to ensure consistency and reduces hallucinations by treating generation as an identification task over this pool
Selectively invokes LLMs only for 'hard' sessions (high uncertainty) during training, validating intents via a Predict-and-Correct loop where the LLM must successfully predict the next item to 'commit' an intent
Eliminates LLM latency at inference by distilling knowledge into a lightweight 'Intent Predictor' that enriches intent labels via collaborative signals from similar sessions

Architecture

The overall SPRINT framework, illustrating the two-stage process: (1) Uncertainty-aware session selection and LLM-based intent refinement (P&C loop) to populate the Global Intent Pool, and (2) Training the SBR model with a lightweight Intent Predictor that learns from the pool.

Breakthrough Assessment

8/10

Addresses the critical bottleneck of LLM latency in recommendation by completely removing the LLM from the inference loop while still leveraging its reasoning capabilities during training.

⚙️ Technical Details

Problem Definition

Setting: Session-based Recommendation (SBR) where the goal is to predict the next item given a short, anonymous sequence of interactions

Inputs: Session S = [i_1, i_2, ..., i_N]

Outputs: Probability distribution for the next item i_next

Pipeline Flow

Session Encoder (Generates base representation)
Intent Predictor (Infers intent scores from Global Pool)
Intent Fusion (Combines base rep with intent signals)
Prediction Head (Outputs item probabilities)

System Modules

Session Encoder

Encode the raw item sequence into a dense vector representation

Model or implementation: Conventional SBR model (e.g., SASRec, GRU4Rec)

Intent Predictor

Identify relevant intents from the Global Intent Pool without using an LLM

Model or implementation: Attention-based query-key-value mechanism (Lightweight MLP)

Intent Fusion

Enhance the session representation with relevant intent information

Model or implementation: Gating mechanism with threshold filtering

Novel Architectural Elements

Decoupled Intent Predictor: A standalone module that learns to mimic LLM intent reasoning but operates independently at inference time
Global Intent Pool integration: The pipeline queries a shared, learned embedding matrix of intents rather than generating text

Modeling

Base Model: Compatible with various SBR backbones (e.g., SASRec, GRU4Rec)

Training Method: Two-stage training: (1) LLM-based intent generation on hard sessions, (2) Joint training of SBR and Intent Predictor with self-training

Objective Functions:

Purpose: Optimize standard next-item prediction accuracy.

Formally: Cross-entropy loss L_rec over the item space.
Purpose: Train the Intent Predictor to match LLM-derived or enriched intent labels.

Formally: Binary Cross Entropy loss L_intent between predicted scores y_hat and targets y.
Purpose: Encourage intent embeddings to capture distinct semantic aspects.

Formally: L_ortho penalizes cosine similarity between different intent embeddings in the pool.

Key Hyperparameters:

uncertainty_percentile_r: 10% (top 10% hardest sessions selected for LLM)
damping_constant_k: 60 (for reciprocal rank fusion of uncertainty scores)
candidate_items_M: 5 (negative samples for P&C loop)
+ 3 more
p_c_iterations_T: 3 (max retries in Predict-and-Correct)
fusion_threshold_tau: 0.5
enrichment_update_freq_rho: 5 epochs

Compute: Significantly reduced compared to full-profiling methods; LLM is invoked only for 10% of training data and never during inference.

Comparison to Prior Work

vs. PO4ISR/Re2LLM: SPRINT removes LLM from inference entirely, avoiding latency issues
vs. EGRec: SPRINT does not fine-tune the LLM, preserving pretrained knowledge and reducing cost
vs. LLM4SBR: SPRINT explicitly validates intents via the Predict-and-Correct loop, whereas LLM4SBR relies on unverified LLM outputs

Limitations

Relies on the quality of the LLM for the initial seed intents; poor LLM reasoning on hard sessions could propagate errors
The Global Intent Pool size and quality depend on the initialization prompts and domain coverage
Requires a two-stage training pipeline which is more complex than standard end-to-end SBR training

Reproducibility

Method relies on an LLM (unspecified specific model in snippet, likely GPT-series or open source equivalent) for the training stage. Code URL not provided in the text. Global Intent Pool initialization requires domain-specific prompting.

📊 Experiments & Results

Evaluation Setup

Next-item prediction on real-world datasets

Benchmarks:

Beauty (Session-based Recommendation)
Yelp (Session-based Recommendation)
Book (Session-based Recommendation)

Metrics:

Recall
NDCG
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Quantitative results are not available in the provided text snippet, but the method claims consistent improvement over SOTA.
The Selective LLM invocation strategy (using top-10% hard sessions) implies massive efficiency gains (theoretically 90% fewer calls) compared to generating profiles for every session.
The framework effectively decouples high-level reasoning (LLM) from real-time scoring (Intent Predictor), allowing the benefits of profiling without the inference cost.

📚 Prerequisite Knowledge

Prerequisites

Session-based Recommendation (SBR) architectures (SASRec, RNNs)
Large Language Models (LLMs) for profiling
Knowledge Distillation / Student-Teacher frameworks

Key Terms

SBR: Session-based Recommendation—recommending items based on a short, anonymous sequence of current user interactions

Global Intent Pool (GIP): A shared, expandable set of intent concepts (e.g., 'anti-aging') maintained across all sessions to constrain LLM output space

Predict-and-Correct (P&C): A validation loop where the LLM's generated intent is accepted only if it helps correctly predict a held-out item from the session

Intent Predictor: A lightweight, trainable module (non-LLM) that infers relevance scores for intents in the GIP given a session representation

Collaborative Intent Enrichment: A semi-supervised strategy where the model propagates intent labels from LLM-annotated sessions to unannotated sessions based on behavioral similarity

Session Entropy: A measure of the diversity of items within a session; high entropy implies diverse/inconsistent interests

Prediction Entropy: A measure of the model's uncertainty in predicting the next item; high entropy implies low confidence