← Back to Paper List

Generative Recommendation for Large-Scale Advertising

Ben Xue, Dan Liu, Lixiang Wang, Mingjie Sun, Peng Wang, Pengfei Zhang, Shaoyun Shi, Tianyu Xu, Yunhao Sha, Zhiqiang Liu, Bo Kong, Bo Wang, Hang Yang, Jieting Xue, Junhao Wang, Shengyu Wang, Shuping Hui, Wencai Ye, Xiao Lin, Yongzhi Li, Yuhang Chen, Zhihui Yin, Quan Chen, Shiyang Wen, Wenjin Wu, Han Li, Guorui Zhou, Changcheng Li, Peng Jiang
Kuaishou Technology
arXiv (2026)
Recommendation MM RL

📝 Paper Summary

Generative Recommendation Computational Advertising Large-Scale Recommender Systems
GR4AD adapts generative recommendation for high-throughput advertising by introducing lazy autoregressive decoding for speed and a list-wise reinforcement learning objective for business value alignment.
Core Problem
Adapting LLM-style generative recommendation to real-time advertising fails because standard decoding is too slow for high-traffic multi-candidate generation, and standard training ignores list-wise business metrics like eCPM and NDCG.
Why it matters:
  • Advertising systems have strict latency budgets (<100ms) that interactive LLM serving techniques cannot meet when generating hundreds of candidates
  • Directly applying next-token prediction aligns with semantic probability but fails to capture the ranked-list utility crucial for ad revenue (eCPM)
  • Standard tokenization misses critical non-semantic business signals (e.g., conversion type, account ID) that drastically alter ad delivery logic
Concrete Example: Identical ad creatives (same video) might target completely different users based on conversion goals (e.g., 'app install' vs 'purchase'). A standard semantic ID model treats them as identical, causing collisions. GR4AD avoids this by hashing non-semantic business signals into the final token layer.
Key Novelty
Production-Oriented Generative Ad Recommendation (GR4AD)
  • LazyAR Decoder: Delays autoregressive dependencies to later layers, allowing the first K layers to be computed once in parallel for all beams, drastically speeding up multi-candidate generation
  • RSPO (Ranking-Guided Softmax Preference Optimization): A list-wise reinforcement learning objective that directly optimizes ranking metrics (NDCG) derived from business values (eCPM) rather than just next-token likelihood
  • UA-SID (Unified Ad Semantic ID): Fuses multimodal content semantics with hash-based business signals (conversion type) to create collision-free, meaningful discrete identifiers
Architecture
Architecture Figure Figure 3
Comparison of Standard Autoregressive, DeepSeek-MTP, and the proposed LazyAR decoder architectures.
Evaluation Highlights
  • +4.2% ad revenue improvement (RPM) in online A/B tests against a production DLRM-based stack serving 400M users
  • Achieves >500 QPS per L20 GPU with <100ms latency, enabling real-time generative serving at massive scale
  • +1.1% revenue gain specifically from the RSPO alignment component compared to standard supervised fine-tuning baselines
Breakthrough Assessment
9/10
Successfully deploys generative recommendation in a massive-scale, latency-critical ad system (400M users), solving critical bottlenecks in serving efficiency (LazyAR) and value alignment (RSPO) that previously hindered industrial adoption.
×