โ† Back to Paper List

Accelerating Listwise Reranking: Reproducing and Enhancing FIRST

Z Chen, R Pradeep, J Lin
University of Waterloo
2025 (2025)
RAG Benchmark

๐Ÿ“ Paper Summary

Listwise Reranking Efficient Retrieval
FIRST accelerates listwise reranking by determining document order solely from the first generated token's logits, achieving 40% latency reduction while maintaining effectiveness across diverse backbones and datasets.
Core Problem
Traditional listwise reranking using LLMs is slow because it requires auto-regressive generation of complete document identifier permutations.
Why it matters:
  • High inference latency makes deploying powerful LLM rerankers prohibitive for real-time search applications
  • Standard language modeling objectives uniformly penalize errors at all positions, failing to prioritize top-ranked documents crucial for retrieval effectiveness
  • Existing efficient methods often sacrifice ranking quality for speed
Concrete Example: A traditional listwise reranker typically generates a sequence like '2 > 1 > 4 > 3' token-by-token. If the LLM is large, generating these multiple tokens creates a bottleneck. FIRST avoids this by looking only at the logits of the very first token generated to infer the full ranking immediately.
Key Novelty
Single-Token Listwise Reranking via Logits
  • Instead of generating a text sequence of document IDs (e.g., 'A > B > C'), the model is trained to output a single token whose logits represent the relevance scores of all candidate documents simultaneously
  • Combines a specific learning-to-rank loss (focusing on pairwise errors) with standard language modeling loss to align the model's first-token probability distribution with the ground truth ranking
Evaluation Highlights
  • Achieved ~40% reduction in inference latency compared to full-generation RankZephyr/RankMistral on TREC Deep Learning datasets
  • FirstMistral (FIRST on Mistral-v0.3) surpassed the original FIRST-Reddy implementation on 8 out of 11 BEIR datasets
  • Demonstrated robust out-of-domain generalization on TREC DL19โ€“23, with FirstMistral (0.7209 nDCG@10) matching full-generation RankZephyr (0.7166 nDCG@10)
Breakthrough Assessment
7/10
Solid reproduction and extension of existing work (FIRST). Validates efficiency gains and generalizes to new backbones, but identifies tokenization issues and negative interference from LM pre-training.
×