Data-efficient Fine-tuning for LLM-based Recommendation

📝 Paper Summary

LLM-based Recommendation Data Pruning / Coreset Selection

DEALRec identifies a small, influential subset of data for fine-tuning recommender LLMs by combining an efficient influence estimation from a surrogate model with an effort score measuring LLM-specific difficulty.

Core Problem

Fine-tuning LLMs on massive, rapidly updating recommendation data is computationally prohibitive, while random few-shot sampling misses crucial representative patterns.

Why it matters:

Recommendation data grows explosively (e.g., TikTok has ~942B interactions/day), requiring frequent updates that are too costly for full LLM fine-tuning
Existing coreset selection methods (heuristic or optimization-based) rely on training the full model first, which defeats the purpose of efficiency for LLMs

Concrete Example: A randomly sampled few-shot dataset might miss trending items or specific user behaviors. Conversely, using a traditional surrogate model to pick data might select samples that are hard for the surrogate but trivial for a pre-trained LLM, leading to suboptimal adaptation.

Key Novelty

Two-stage Influence-Effort Scoring (DEALRec)

Efficiently estimates the 'influence score' (impact of removing a sample on global loss) using a small surrogate model and a symmetric HVP approximation to avoid retraining
Introduces an 'effort score' (gradient norm of the LLM on the sample) to correct the surrogate's bias, prioritizing samples that are specifically difficult for the LLM to learn

Architecture

The overall framework of DEALRec.

Evaluation Highlights

Surpasses full-data fine-tuning performance using only 2% of training samples on MovieLens-1M
Reduces time costs by 97% compared to full fine-tuning while maintaining competitive accuracy
Outperforms random sampling and heuristic coreset methods (e.g., Entropy, Geometry) across three datasets (MovieLens, Amazon Beauty, Amazon Games)

Breakthrough Assessment

8/10

Significantly improves practical viability of LLM recommenders by slashing compute costs (97% reduction) while maintaining or exceeding accuracy. The decoupling of influence (surrogate) and effort (LLM) is a clever engineering solution.

⚙️ Technical Details

Problem Definition

Setting: Select a subset S from training data D (where |S| = r|D|) to fine-tune an LLM such that performance on test data is maximized.

Inputs: Full training set D of user sequences s=(x,y), selection ratio r

Outputs: Pruned subset S for fine-tuning

Pipeline Flow

Surrogate Training: Train small model (SASRec) on full data D
Influence Calculation: Compute influence scores using symmetric HVP on surrogate
Effort Calculation: Compute effort scores (gradient norms) using non-fine-tuned LLM
Score Combination: Combine scores and apply stratified sampling to select subset S
Fine-tuning: Fine-tune LLM on S

System Modules

Surrogate Model (Scoring)

Estimate global influence of samples cheaply

Model or implementation: SASRec (small traditional recommender)

LLM Scorer (Scoring)

Measure sample difficulty specifically for the LLM

Model or implementation: Llama-2-7B or ChatGLM2-6B (frozen or LoRA)

Sampler

Select representative subset based on combined scores

Model or implementation: Stratified Sampling Algorithm

Novel Architectural Elements

Symmetric influence estimation: Reformulates influence calculation to require only one HVP estimation for the entire dataset instead of one per sample
Gap regularization: Combining surrogate-based influence with LLM-based effort scores to bridge the capability gap between small models and LLMs

Modeling

Base Model: Llama-2-7B and ChatGLM2-6B

Training Method: Supervised Fine-Tuning (LoRA)

Objective Functions:

Purpose: Optimize LLM to predict next item.

Formally: Minimize negative log-likelihood of next item y given history x.

Adaptation: LoRA (Low-Rank Adaptation)

Trainable Parameters: LoRA parameters (rank and alpha not explicitly detailed in snippet, assumed standard)

Key Hyperparameters:

selection_ratio_r: 0.02 (2%)
lambda: Hyperparameter balancing influence and effort scores

Compute: Reduces time costs by 97% compared to full fine-tuning. GPU details not in snippet.

Comparison to Prior Work

vs. Random: DEALRec explicitly models sample influence and difficulty
vs. Geometry/Entropy: DEALRec uses influence functions to directly estimate impact on loss, rather than heuristics
vs. GraNd [not cited in paper]: DEALRec uses a surrogate model for efficiency, whereas GraNd typically requires the target model's gradients which is too expensive for LLMs on full data

Limitations

Relies on a surrogate model which introduces a capability gap (partially mitigated by effort score)
Influence function approximation assumes convexity/smoothness which may not fully hold for deep nets
Requires one full pass of the LLM to compute effort scores (though inference-only/gradient-norm only, no backprop updates)

Reproducibility

Code: https://github.com/Linxyhaha/DEALRec

Code and datasets available at https://github.com/Linxyhaha/DEALRec. Uses public datasets (MovieLens-1M, Amazon Beauty, Amazon Games).

📊 Experiments & Results

Evaluation Setup

Sequential recommendation (next-item prediction)

Benchmarks:

MovieLens-1M (Movie Recommendation)
Amazon Beauty (Product Recommendation)
Amazon Games (Product Recommendation)

Metrics:

NDCG@10
HR@10
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
MovieLens-1M	NDCG@10	0.3345	0.3412	+0.0067
Training Cost	Time Reduction	100%	3%	-97%

Experiment Figures

Comparison of fine-tuning paradigms: (a) Few-shot random vs. (b) Surrogate-based pruning vs. DEALRec.

Analysis of the gap between Surrogate Model and LLM.

Main Takeaways

DEALRec consistently outperforms random sampling and heuristic baselines across all datasets.
The method achieves comparable or better performance than full fine-tuning with a fraction (2%) of the data, indicating high redundancy in recommendation datasets.
The combination of Influence Score (general representativeness) and Effort Score (LLM-specific difficulty) is critical; ablating either drops performance.

📚 Prerequisite Knowledge

Prerequisites

Influence Functions (Hampel, 1974; Koh & Liang, 2017)
Hessian-Vector Products (HVP)
Parameter-Efficient Fine-Tuning (LoRA)
Sequential Recommendation

Key Terms

Influence Function: A technique to estimate how model parameters or loss would change if a specific training point were removed or upweighted, without actually retraining

Hessian-Vector Product (HVP): An operation that computes the product of the Hessian matrix (second derivatives) and a vector, used to approximate influence efficiently

Surrogate Model: A smaller, cheaper model (e.g., SASRec) used to estimate data influence, replacing the computationally expensive LLM during the selection phase

Effort Score: The gradient norm of a sample's loss with respect to LLM parameters, measuring how 'hard' or significant that sample is for the LLM specifically

Stratified Sampling: A sampling method that divides the population into subgroups (strata) and samples from each to ensure representation across the distribution