Personalization of Large Language Models: A Survey

📝 Paper Summary

Personalized Text Generation Recommendation Personalization

This survey unifies the disconnected fields of personalized text generation and downstream task personalization into a single framework, formalizing shared definitions, granularities, and evaluation metrics.

Core Problem

Research on personalized LLMs is fragmented into two isolated communities: one focusing on text generation quality and another on downstream tasks like recommendation, utilizing different terminologies and metrics despite sharing underlying mechanisms.

Why it matters:

Prior surveys examine these aspects in isolation, missing opportunities to transfer techniques (e.g., retrieval methods) between generative and task-oriented personalization
Lack of a unified formalism hinders the development of generalist agents that can seamlessly transition from personalized conversation to task-oriented reasoning

Concrete Example: A 'personalized text generation' researcher might evaluate a chatbot's empathy directly against user writings, while a 'downstream task' researcher uses LLM embeddings to improve movie rating predictions. Both use user history and retrieval, but they optimize for different objectives (text quality vs. prediction accuracy) without cross-pollinating insights.

Key Novelty

Unified Personalization Taxonomy

Conceptualizes personalization as two sides of the same coin: 'Direct' (optimizing the text itself for user alignment) and 'Indirect' (using intermediate text/embeddings to optimize a separate function like a recommender system)
Formalizes personalization granularity into three levels: User-level (finest), Persona-level (group-based), and Global (general public), characterizing the trade-offs between data requirements and specificity

Architecture

A taxonomy and workflow diagram illustrating the two main usage categories of personalized LLMs: Personalized Text Generation and Downstream Task Personalization.

Breakthrough Assessment

5/10

A comprehensive survey that provides a necessary structural framework and taxonomy for a fragmented field, though it does not introduce a new algorithm or experimental breakthrough itself.

⚙️ Technical Details

Problem Definition

Setting: Adapting LLM outputs to user-specific contexts, either for direct generation or downstream tasks

Inputs: User textual input x, User data D_u (documents, attributes, history)

Outputs: Personalized text y_hat or downstream prediction r_hat

Pipeline Flow

Query Generation: Input -> Query
Adaptation: Query + User Data -> Personalized Context
Prompt Generation: Input + Context -> Personalized Prompt
LLM Generation: Personalized Prompt -> Text/Embedding
Branching: Direct Output OR Downstream Model

System Modules

Query Generation Function (Input Processing)

Transform user input x into a query suitable for retrieving user data

Model or implementation: Generic function phi_q

Adaptation Function

Integrate user-specific information (history, preferences) using the query

Model or implementation: Generic function A

Personalized Prompt Generation (Input Processing)

Combine original input and adapted context into a final personalized input

Model or implementation: Generic function phi_p

LLM

Generate personalized text or embeddings

Model or implementation: Model M parameterized by theta

Downstream Model

Use LLM output to perform a specific task (e.g., recommendation)

Model or implementation: Function F

Novel Architectural Elements

Unified personalization workflow that explicitly branches into 'Direct Text Generation' (evaluating text quality) and 'Indirect Downstream Task' (evaluating task metrics), sharing the upstream Adaptation and Query functions

Comparison to Prior Work

vs. Chen (2023): Unifies text generation and downstream tasks (e.g., RecSys) into one framework, whereas prior surveys treated them separately
vs. General LLM Surveys: Introduces specific taxonomies for personalization granularity (User vs. Persona vs. Global) and adaptation functions [not cited in paper]

Limitations

Evaluation Scarcity: High-quality user-written ground truth for evaluating personalized generation is rare, making 'Direct' evaluation difficult
Privacy: Balancing personalization with user data privacy remains an open challenge not fully solved by current techniques
Interpretability: In downstream tasks, the intermediate personalized text or embeddings generated by the LLM often lack interpretability

Reproducibility

No replication artifacts mentioned in the paper (Survey paper).

📊 Experiments & Results

Main Takeaways

The fields of personalized text generation and downstream task personalization (like RecSys) share fundamental mechanisms (retrieval, user modeling) but have historically lacked a shared conceptual framework.
Personalization granularity is a key design choice: 'User-level' offers the finest control but requires dense data, while 'Persona-level' groups users to handle data sparsity (cold-start).
Evaluation remains a primary bottleneck: Direct generation lacks ground truth (user-written references are scarce), while downstream tasks often ignore the quality of the intermediate text generated by the LLM.
Future convergence is expected: Intelligent agents will likely merge these capabilities, seamlessly transitioning from personalized conversation to structured task completion.

📚 Prerequisite Knowledge

Prerequisites

Large Language Models (LLMs)
Recommender Systems
Retrieval-Augmented Generation (RAG)

Key Terms

RAG: Retrieval-Augmented Generation—AI systems that answer questions by first searching for relevant documents (e.g., user history) to prompt the model

RLHF: Reinforcement Learning from Human Feedback—aligning models using rewards derived from human preferences

SFT: Supervised Fine-Tuning—training a model on labeled examples before applying other optimization techniques

Cold-start problem: The difficulty of effectively personalizing for new users who have no prior interaction history or data

Downstream task personalization: Leveraging LLM capabilities (text or embeddings) to improve performance on specific applications like recommendation systems, rather than focusing on the text quality itself

Persona-level personalization: Tailoring model behavior to groups of users with shared characteristics (personas) rather than individual users, useful when individual data is sparse