← Back to Paper List

LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay

Soumik Dey, Benjamin Braun, Naveen Ravipati, Hansi Wu, Binbin Li
eBay Inc
arXiv (2025)
Recommendation P13N

📝 Paper Summary

E-commerce Search & Advertising Dense Retrieval Knowledge Distillation
LLMDistill4Ads improves ad keyphrase recommendations by distilling relevance judgments from an LLM teacher through a cross-encoder assistant into a scalable bi-encoder student using Pearson correlation loss.
Core Problem
Click-based training data for ad recommendations is sparse and biased because it only reflects keyphrases previously approved by the search engine ('middleman bias') and subject to ranking position bias.
Why it matters:
  • Items ranked lower receive fewer clicks regardless of relevance, making lack of clicks an unreliable negative signal
  • Training on biased click logs perpetuates existing system limitations, preventing the discovery of new, relevant keyphrases for advertisers
  • High latency of accurate cross-encoder models prevents their direct use in large-scale retrieval systems with billions of items
Concrete Example: An item might be relevant to 'vintage lamp', but if the current search engine never shows the item for that query, no clicks are generated. A model trained only on clicks will learn that 'vintage lamp' is irrelevant, whereas an LLM or human judge would identify the missed opportunity.
Key Novelty
Two-Stage Multi-Task Distillation (LLM → CE → BE)
  • Uses a 'Teacher-Assistant' framework where a heavy LLM labels data to train a Cross-Encoder assistant, which then teaches a lightweight Bi-Encoder student
  • Employs a multi-task objective combining supervised click data (CTR), Search Relevance scores (SR), and Pearson correlation-based distillation from the assistant to calibrate ranking scores
  • Integrates heterogeneous signals to mitigate 'middleman bias'—allowing the model to learn from accepted, rejected, and unseen keyphrases
Architecture
Architecture Figure Figure 2
The multi-task training framework showing the Teacher-Assistant-Student hierarchy and data sources.
Evaluation Highlights
  • +51.26% increase in Gross Merchandise Volume (GMB) bought in a 12-day online A/B test compared to a CTR-only baseline
  • +38.69% improvement in Return on Ad Spend (ROAS) in the same online test
  • +11.75% increase in average adopted keyphrase count per item, indicating better alignment with seller preferences
Breakthrough Assessment
7/10
Strong industrial application showing significant online business gains. While the teacher-assistant distillation architecture is known, the specific application to mitigating middleman bias in ads with Pearson loss is impactful.
×