University of Illinois Urbana-Champaign,
Korea University
arXiv
(2025)
RecommendationP13N
📝 Paper Summary
Session-based Recommendation (SBR)LLM-based User Profiling
SPRINT enhances session-based recommendation by selectively invoking LLMs during training to populate a global intent pool, then training a lightweight predictor to infer these intents at inference time without LLM dependency.
Core Problem
Directly applying LLMs to session-based recommendation fails because short, anonymous sessions lack sufficient context for reliable profiling (leading to hallucinations) and per-session LLM inference is prohibitively slow.
Why it matters:
Real-world session data is voluminous and anonymous, making the computational cost of live LLM inference impractical for deployment
Short sessions (e.g., avg length ~3.6 on Yelp) provide too little signal for standard LLM profiling, resulting in vague or noisy descriptions that degrade recommendation accuracy
Existing intent modeling methods lack explicit validation mechanisms to ensure generated intents actually contribute to predictive performance
Concrete Example:A user browsing 'eco-friendly' and 'environmentally friendly' skincare might generate redundant, slightly different text profiles if processed independently by an LLM. Furthermore, on a short session with just 3 clicks, an LLM might hallucinate a 'summer necessity' intent that isn't grounded in the data, misleading the recommender.
Key Novelty
Scalable and Predictive Intent Refinement (SPRINT)
Constrains LLM generation to a shared 'Global Intent Pool' (GIP) to ensure consistency and reduces hallucinations by treating generation as an identification task over this pool
Selectively invokes LLMs only for 'hard' sessions (high uncertainty) during training, validating intents via a Predict-and-Correct loop where the LLM must successfully predict the next item to 'commit' an intent
Eliminates LLM latency at inference by distilling knowledge into a lightweight 'Intent Predictor' that enriches intent labels via collaborative signals from similar sessions
Architecture
The overall SPRINT framework, illustrating the two-stage process: (1) Uncertainty-aware session selection and LLM-based intent refinement (P&C loop) to populate the Global Intent Pool, and (2) Training the SBR model with a lightweight Intent Predictor that learns from the pool.
Breakthrough Assessment
8/10
Addresses the critical bottleneck of LLM latency in recommendation by completely removing the LLM from the inference loop while still leveraging its reasoning capabilities during training.
⚙️ Technical Details
Problem Definition
Setting: Session-based Recommendation (SBR) where the goal is to predict the next item given a short, anonymous sequence of interactions
Inputs: Session S = [i_1, i_2, ..., i_N]
Outputs: Probability distribution for the next item i_next
Pipeline Flow
Session Encoder (Generates base representation)
Intent Predictor (Infers intent scores from Global Pool)
Intent Fusion (Combines base rep with intent signals)
Prediction Head (Outputs item probabilities)
System Modules
Session Encoder
Encode the raw item sequence into a dense vector representation
Model or implementation: Conventional SBR model (e.g., SASRec, GRU4Rec)
Intent Predictor
Identify relevant intents from the Global Intent Pool without using an LLM
Model or implementation: Attention-based query-key-value mechanism (Lightweight MLP)
Intent Fusion
Enhance the session representation with relevant intent information
Model or implementation: Gating mechanism with threshold filtering
Novel Architectural Elements
Decoupled Intent Predictor: A standalone module that learns to mimic LLM intent reasoning but operates independently at inference time
Global Intent Pool integration: The pipeline queries a shared, learned embedding matrix of intents rather than generating text
Modeling
Base Model: Compatible with various SBR backbones (e.g., SASRec, GRU4Rec)
Training Method: Two-stage training: (1) LLM-based intent generation on hard sessions, (2) Joint training of SBR and Intent Predictor with self-training
Objective Functions:
Purpose: Optimize standard next-item prediction accuracy.
Formally: Cross-entropy loss L_rec over the item space.
Purpose: Train the Intent Predictor to match LLM-derived or enriched intent labels.
Formally: Binary Cross Entropy loss L_intent between predicted scores y_hat and targets y.
Purpose: Encourage intent embeddings to capture distinct semantic aspects.
Formally: L_ortho penalizes cosine similarity between different intent embeddings in the pool.
Key Hyperparameters:
uncertainty_percentile_r: 10% (top 10% hardest sessions selected for LLM)
damping_constant_k: 60 (for reciprocal rank fusion of uncertainty scores)
candidate_items_M: 5 (negative samples for P&C loop)
p_c_iterations_T: 3 (max retries in Predict-and-Correct)
fusion_threshold_tau: 0.5
enrichment_update_freq_rho: 5 epochs
Compute: Significantly reduced compared to full-profiling methods; LLM is invoked only for 10% of training data and never during inference.
Comparison to Prior Work
vs. PO4ISR/Re2LLM: SPRINT removes LLM from inference entirely, avoiding latency issues
vs. EGRec: SPRINT does not fine-tune the LLM, preserving pretrained knowledge and reducing cost
vs. LLM4SBR: SPRINT explicitly validates intents via the Predict-and-Correct loop, whereas LLM4SBR relies on unverified LLM outputs
Limitations
Relies on the quality of the LLM for the initial seed intents; poor LLM reasoning on hard sessions could propagate errors
The Global Intent Pool size and quality depend on the initialization prompts and domain coverage
Requires a two-stage training pipeline which is more complex than standard end-to-end SBR training
Reproducibility
Method relies on an LLM (unspecified specific model in snippet, likely GPT-series or open source equivalent) for the training stage. Code URL not provided in the text. Global Intent Pool initialization requires domain-specific prompting.
📊 Experiments & Results
Evaluation Setup
Next-item prediction on real-world datasets
Benchmarks:
Beauty (Session-based Recommendation)
Yelp (Session-based Recommendation)
Book (Session-based Recommendation)
Metrics:
Recall
NDCG
Statistical methodology: Not explicitly reported in the paper
Main Takeaways
Quantitative results are not available in the provided text snippet, but the method claims consistent improvement over SOTA.
The Selective LLM invocation strategy (using top-10% hard sessions) implies massive efficiency gains (theoretically 90% fewer calls) compared to generating profiles for every session.
The framework effectively decouples high-level reasoning (LLM) from real-time scoring (Intent Predictor), allowing the benefits of profiling without the inference cost.
SBR: Session-based Recommendation—recommending items based on a short, anonymous sequence of current user interactions
Global Intent Pool (GIP): A shared, expandable set of intent concepts (e.g., 'anti-aging') maintained across all sessions to constrain LLM output space
Predict-and-Correct (P&C): A validation loop where the LLM's generated intent is accepted only if it helps correctly predict a held-out item from the session
Intent Predictor: A lightweight, trainable module (non-LLM) that infers relevance scores for intents in the GIP given a session representation
Collaborative Intent Enrichment: A semi-supervised strategy where the model propagates intent labels from LLM-annotated sessions to unannotated sessions based on behavioral similarity
Session Entropy: A measure of the diversity of items within a session; high entropy implies diverse/inconsistent interests
Prediction Entropy: A measure of the model's uncertainty in predicting the next item; high entropy implies low confidence