← Back to Paper List

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Hao Zhen, Ninghao Liu, Linas Baltrunas
University of Georgia, Netflix, Capital One, Hong Kong Polytechnic University
arXiv (2026)
Recommendation P13N RL

📝 Paper Summary

LLM-based Recommendation Generative Recommender Systems Prompt Optimization / Verbalization
This paper proposes a framework that learns to translate raw user interaction logs into optimized natural language summaries (verbalizations) by using recommendation accuracy as a reward signal for reinforcement learning.
Core Problem
Current LLM-based recommenders rely on rigid, template-based methods to convert user history into text, which creates parsing overhead, includes noise, and lacks semantic context.
Why it matters:
  • Template-based concatenation forces LLMs to reason over granular, heterogeneous logs rather than synthesized user preferences
  • Valuable signal is lost when raw logs are not summarized or enriched with metadata, hurting performance on cold-start items
  • Standard prompt engineering is insufficient because optimal verbalization depends on specific user history instances
Concrete Example: A template might output '20250608, Monday... Play, Duration: 80.08 min', which is noisy and hard to parse. An optimized verbalizer might rewrite this as 'The user showed strong preference for dark thrillers by binge-watching 5 episodes of Stranger Things', directly exposing the preference signal.
Key Novelty
Two-Stage GRPO Framework for Verbalization and Reasoning
  • Decomposes recommendation into a 'Verbalizer' (rewrites logs into text) and a 'Reasoner' (predicts next item from text)
  • Trains the Verbalizer using RL (GRPO) where the reward comes from an Oracle LLM's prediction accuracy on the rewritten text
  • Subsequently fine-tunes the Reasoner on the distribution of optimized verbalizations
Evaluation Highlights
  • Achieves 92.9% relative improvement in discovery item recommendation accuracy over template-based baselines on a large industrial dataset
  • The Verbalizer's learned transformations alone contribute significantly (roughly 50 percentage points of the total gain) compared to just training the Reasoner on raw templates
  • Rewrite-based verbalization outperforms action-based (filtering/enriching) verbalization by enabling aggregation and summarization strategies
Breakthrough Assessment
8/10
Significant industrial application demonstrating that learning how to represent data (verbalization) is as critical as the reasoning model itself. The huge relative gains (+93%) suggest a major inefficiency in current template-based approaches.
×