← Back to Paper List

When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning

Yijiang River Dong, Tiancheng Hu, Yinhong Liu, Ahmet Ustun, Nigel Collier
University of Cambridge, Cohere For AI
Conference on Empirical Methods in Natural Language Processing (2025)
P13N RL Benchmark

📝 Paper Summary

Personalized Preference Learning Reinforcement Learning from Human Feedback (RLHF)
A comprehensive benchmarking framework reveals that while personalization improves preference modeling for diverse users, it depends heavily on dataset disagreement levels and incurs significant costs in safety and reasoning capabilities.
Core Problem
Standard RLHF assumes homogeneous user preferences, marginalizing minority viewpoints, while existing personalization research relies on disjoint datasets and lacks evaluation of unintended side effects like safety degradation.
Why it matters:
  • Standard alignment biases models toward Western, educated demographics, failing to serve the diverse cultural and ideological backgrounds of global users
  • Current evaluation is fragmented; studies use incomparable datasets (narrow-domain real vs. synthetic general), preventing fair comparison of algorithms
  • The potential for personalization to compromise general model capabilities (safety, reasoning)—termed 'personalization tax'—is largely unmeasured
Concrete Example: In the P-SOUPS dataset, users have diametrically opposing preferences on dimensions like 'expertise' or 'style'. A standard non-personalized reward model would average these conflicts, satisfying neither user group, whereas personalized models must adapt to each specific persona.
Key Novelty
Multi-Faceted Evaluation Framework for Personalized RLHF
  • Introduces a principled dataset analysis framework quantifying 'inter-user disagreement' and 'intra-user consistency' to predict where personalization is actually useful
  • Evaluates not just accuracy, but 'personalization tax'—measuring degradation in safety and reasoning when models over-fit to specific user preferences
  • Benchmarks eight distinct personalization algorithms across three diverse datasets (synthetic and real) to isolate algorithmic strengths independent of data domain
Evaluation Highlights
  • Collaborative learning methods (e.g., Personalized RM) achieve up to +6% accuracy improvement over strong per-user fine-tuning baselines
  • Personalization introduces a 'safety tax', causing up to a 20% decline on safety and reasoning benchmarks compared to non-personalized base models
  • Performance gaps between different personalization methods reach up to 36% when user disagreement is high, but shrink significantly on datasets with low preference divergence
Breakthrough Assessment
7/10
While not proposing a new architecture, it establishes a critical evaluation methodology and exposes the 'personalization tax', a significant finding for the safety/alignment community.
×