Weiqin Yang, Bohao Wang, Zhenxiang Xu, Jiawei Chen, Shengjia Zhang, Jingbang Chen, Canghong Jin, Can Wang
Zhejiang University,
The Chinese University of Hong Kong,
Hangzhou City University
arXiv
(2026)
RecommendationP13N
📝 Paper Summary
LLM-based RecommendationSequential Recommendation
BEAR aligns LLM training with inference by adding a regularization term that ensures every token of a positive item ranks high enough to survive the greedy pruning of beam search.
Core Problem
Supervised Fine-Tuning (SFT) maximizes the global probability of positive items, but inference uses Beam Search which greedily prunes sequences with low-probability prefixes.
Why it matters:
High-probability items are frequently discarded early because their initial tokens do not rank in the top candidates (training-inference inconsistency)
Empirical analysis shows over 80% of positive items with top-B global probability are pruned before reaching final recommendations in standard models
Concrete Example:Consider the item 'Bocchi the Rock!' with the highest overall probability (23%). Beam search with width 2 might prune the prefix 'Bocchi' (25%) if two other prefixes like 'The' (30%) and 'A' (45%) are higher, causing the correct item to never be retrieved.
Key Novelty
Beam-Search-Aware Regularization (BEAR)
Instead of computationally expensive simulation of beam search during training, BEAR optimizes a 'necessary condition' for retrieval: every token must rank in the top-B candidates.
Introduces a differentiable regularization term using a sigmoid relaxation to penalize the 'pruning margin' (gap between token probability and the B-th best candidate's probability).
Achieves alignment between the training objective and the greedy pruning mechanism of inference without requiring additional forward passes.
Architecture
Conceptual illustration of the Training-Inference Inconsistency. Shows how an item with the highest global probability can be pruned early by beam search due to a low-probability prefix.
Evaluation Highlights
Outperforms 9 state-of-the-art fine-tuning baselines across four real-world datasets
Achieves an average performance improvement of 12.50% over baselines
Incurs negligible computational overhead compared to standard SFT, unlike naive beam search simulation methods
Breakthrough Assessment
8/10
Identifies a fundamental disconnect between SFT and Beam Search in LLM recommenders. The solution is theoretically grounded (necessary condition), highly efficient (no extra forward passes), and yields significant empirical gains.
⚙️ Technical Details
Problem Definition
Setting: Sequential Recommendation as Next-Item Generation
Inputs: User historical interaction sequence formatted as a text prompt x
Outputs: Textual description y of the predicted next item
Pipeline Flow
Prompt Construction
LLM Inference
Beam Search Decoding
System Modules
Prompt Constructor
Converts user history into a structured textual prompt using a predefined template
Model or implementation: Rule-based template
Fine-tuned LLM
Estimates the conditional probability of next tokens given the prompt and generated prefix
Model or implementation: Large Language Model (backbone architecture not specified in text)
Beam Search Decoder
Iteratively expands sequences, pruning those where the next token is not in the top-B candidates
Model or implementation: Algorithm (Beam Search)
Modeling
Base Model: Various LLM-based RS backbones (specific architectures not listed in provided text)
Training Method: Supervised Fine-Tuning with BEAR Regularization
Objective Functions:
Purpose: Maximize probability of positive items while ensuring tokens survive beam search pruning.
Formally: L_BEAR = L_SFT + λ * L_Reg
Purpose: Standard SFT loss.
Formally: L_SFT = -log P(y|x)
Purpose: Regularization to minimize pruning risk.
Formally: L_Reg = sum(log(1 + exp(-Δ^B_t / ξ))), where Δ^B_t is the pruning margin
Key Hyperparameters:
beam_width_B: B (Variable, usually small integer e.g. 10)
temperature_ξ: Controls smoothness of the sigmoid surrogate (value not reported in text)
regularization_weight_λ: Controls strength of BEAR regularization (value not reported in text)
Compute: Negligible additional overhead compared to standard SFT; no additional forward passes required
Comparison to Prior Work
vs. SFT: BEAR adds a regularization term to enforce the necessary condition for beam search survival, reducing training-inference inconsistency
Limitations
Optimizes a necessary condition (token in top-B) rather than the sufficient condition (sequence in top-B), which is a relaxation
Relies on the assumption that violating the necessary condition is the primary cause of pruning (empirically validated as >70%)
Reproducibility
Code will be released upon acceptance. No specific repository URL provided in the text. Hyperparameters (lambda, xi, beam width for training) are not detailed in the provided text.
📊 Experiments & Results
Evaluation Setup
Sequential recommendation on real-world datasets
Benchmarks:
Book (Sequential Recommendation)
Toy (Sequential Recommendation)
Metrics:
Ranking metrics (Top-K)
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
Average across 4 datasets
Performance Improvement
0.00
12.50
+12.50
Experiment Figures
Analysis of pruning causes. Pie chart or bar graph showing the proportion of incorrect pruning cases caused by violating the necessary condition.
Computational efficiency comparison. Bar chart comparing training time/cost of SFT, Naive Beam Search Simulation, and BEAR.
Main Takeaways
Training-Inference inconsistency is a major bottleneck: >80% of high-global-probability positive items are pruned early by beam search in standard SFT models.
The primary cause of incorrect pruning (>70% of cases) is the violation of the 'necessary condition': a positive token failing to rank in the top-B candidates at its specific step.
BEAR significantly outperforms baselines by explicitly optimizing this necessary condition, showing that aligning training objectives with inference mechanics is critical for LLM-based recommendation.
📚 Prerequisite Knowledge
Prerequisites
Generative Recommendation with LLMs
Beam Search Decoding
Supervised Fine-Tuning (SFT)
Key Terms
SFT: Supervised Fine-Tuning—adapting a pre-trained Large Language Model to a specific task using labeled examples
Beam Search: A heuristic search algorithm that explores a graph by expanding the most promising node in a limited set (beam width)
Beam Width (B): The number of candidate sequences retained at each step of the beam search decoding process
Greedy Pruning: The process in beam search where candidates falling outside the top-B probability ranking are discarded immediately
Pruning Margin: The difference between the log-probability of the B-th best candidate and the current token's probability; a positive margin implies the token will be pruned
Training-Inference Inconsistency: The mismatch between the training objective (maximizing global item probability) and the inference mechanism (greedy local pruning)
Necessary Condition: The requirement that for a sequence to survive beam search, every one of its prefixes must rank within the top-B candidates at its respective step