← Back to Paper List

Pearl: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

Sheshera Mysore, Zhuoran Lu, Mengting Wan, Longqi Yang, Steve Menezes, Tina Baghaee, E. Gonzalez, Jennifer Neville, Tara Safavi
Microsoft, Purdue University
CUSTOMNLP4U (2023)
P13N Memory RAG

📝 Paper Summary

Conversational personalization RAG-based personalization
Pearl personalizes writing assistants by training a retriever using a difference-of-likelihoods metric to select only those historical documents that empirically improve the generation quality of specific user requests.
Core Problem
LLM writing assistants generate generic text because they lack access to user-specific style and values; standard retrieval methods fail because they assume all user history is relevant, even when requests diverge from past behavior.
Why it matters:
  • Personalized fine-tuning is difficult to scale and serve for millions of individual users
  • Users have limited historical data compared to generic retrieval corpora, making high-precision retrieval critical
  • Standard retrieval models (bi-encoders) are often trained on relevance, not on whether the document actually helps the LLM generate better text
Concrete Example: A user asks an assistant to draft a work email. A standard retriever might fetch a past email simply because it shares keywords, even if the tone is wrong. Pearl's retriever, trained on generation utility, would calculate that the past email does not increase the likelihood of the desired target text and would avoid selecting it, or conversely, select a document with lower keyword overlap that provides the correct stylistic template.
Key Novelty
Generation-Calibrated Retrieval for Personalization
  • Selects training data by using an auxiliary model to find specific request-document pairs where the document significantly increases the likelihood of the ground-truth target text compared to the request alone
  • Uses a scale-calibrating loss function with an 'anchor' value to ensure the retriever's scores are proportional to the actual downstream generation quality, preventing score skew common in cross-encoders
Breakthrough Assessment
7/10
Proposes a logical, theoretically grounded method for aligning retrieval with generation utility (calibration). While the 'generation-aware' retrieval concept exists, applying it to personalization with specific calibration objectives is a strong contribution.
×