← Back to Paper List

Automating Personalization: Prompt Optimization for Recommendation Reranking

Chen Wang, Mingdai Yang, Zhiwei Liu, Pan Li, Linsey Pang, Qingsong Wen, Philip Yu
University of Illinois Chicago, The University of Chicago, Salesforce AI Research, Georgia Institute of Technology, Squirrel Ai Learning
arXiv (2025)
Recommendation P13N Reasoning

📝 Paper Summary

Recommendation Reranking Prompt Optimization
AGP improves recommendation reranking by automatically refining the user profile generation prompt using batched, position-aware feedback that identifies specific ranking errors.
Core Problem
LLM-based reranking relies on manually crafted prompts that fail to scale or capture nuanced preferences from noisy item metadata, while existing optimization methods use aggregated metrics that lack actionable guidance.
Why it matters:
  • Manual prompt engineering is labor-intensive, static, and prone to trial-and-error, limiting scalability across diverse user behaviors.
  • Standard optimization metrics like NDCG provide only a general score, failing to tell the LLM *why* a specific ranking was poor or how to fix it.
  • Unstructured metadata (e.g., noisy titles) makes it difficult for standard prompts to infer accurate user profiles.
Concrete Example: If a user likes 'Sci-Fi' but the LLM ranks a relevant movie 5th instead of 1st, standard methods simply report a lower NDCG score. AGP generates specific feedback stating 'Item ranked 5th should be 1st', prompting the system to refine the profile generator to better emphasize 'Sci-Fi' preferences.
Key Novelty
Auto-Guided Prompt Refinement (AGP)
  • Optimizes the *user profile generation* prompt rather than the final reranking prompt, allowing the LLM to better summarize preferences before ranking.
  • Uses *Position-Based Feedback* to generate explicit textual instructions based on the gap between an item's predicted rank and its ideal rank.
  • employs *Batched Training* to aggregate feedback across multiple users, preventing the prompt from overfitting to individual user quirks.
Evaluation Highlights
  • Achieves improvements of 5.61–20.68% in NDCG@10 over baseline models (LightGCN, SASRec) across Amazon, Yelp, and Goodreads datasets.
  • Demonstrates high data efficiency by reaching optimal performance with only 100 training users.
  • Enhances graph-based recommenders (LightGCN) significantly, showing 9.36–20.68% gains by injecting semantic personalization into collaborative filtering results.
Breakthrough Assessment
7/10
Effective application of LLM self-optimization to recommendation. The shift from optimizing reranking directly to optimizing profile generation with position-based feedback is a clever, interpretable design choice yielding strong results.
×