← Back to Paper List

Rethinking Personalization in Large Language Models at the Token Level

Chenheng Zhang, Yijun Lu, Lizhe Fang, Chunyuan Zheng, Jiajun Chai, Xiaohan Wang, Guojun Yin, Wei Lin, Yisen Wang, Zhouchen Lin
arXiv (2026)
P13N Benchmark Memory

📝 Paper Summary

Personalized Text Generation LLM Training Objectives
PerCE improves LLM personalization by identifying and upweighting specific tokens that causally depend on user profile information during training, rather than treating all tokens equally.
Core Problem
Standard training optimizes the average loss over all tokens uniformly, but personalization is sparse—only specific tokens (like stylistic choices or entity preferences) actually depend on the user profile.
Why it matters:
  • Treating all tokens equally dilutes the model's focus on user-specific needs, limiting personalization performance
  • Existing methods focus on retrieval or data synthesis but overlook that different tokens contribute to personalization to varying degrees
  • Standard Cross-Entropy loss fails to prioritize the tokens that actually carry the personalization signal
Concrete Example: In personalized abstract generation, personalization is reflected in stylistic tokens, whereas in dialogue, it appears in tokens encoding individual traits. A standard model treats common stopwords and these crucial personal tokens with equal importance, failing to capture the user's unique voice.
Key Novelty
PerCE (Personalized Cross-Entropy)
  • Uses a self-contrast metric (PerContrast) to measure the 'Personal Influence Ratio' (PIR) of each token by comparing probabilities with and without the user persona
  • Applies an Expectation-Maximization (EM) style training loop: first estimate token importance via PIR (E-step), then optimize the model using weighted Cross-Entropy (M-step)
Evaluation Highlights
  • +68.04% improvement in METEOR score on the Personalized Review Writing task (LongLaMP) with Qwen3-4B compared to standard Cross-Entropy
  • Achieves average gains of over 10% across all tasks and models on the LongLaMP benchmark
  • Demonstrates strong cross-task transfer: a Qwen3-4B model trained only on Topic Writing achieves +56.62% gain on Abstract Generation compared to the baseline
Breakthrough Assessment
8/10
Proposes a principled, theoretically grounded (causal) method for token-level personalization that yields massive empirical gains (+68%) with minimal computational overhead.
×