Guided Profile Generation Improves Personalization with LLMs

📝 Paper Summary

User Profiling LLM Personalization

GPG improves LLM personalization by first generating a descriptive natural language profile from raw user history using guiding questions, rather than feeding raw context directly.

Core Problem

LLMs struggle to effectively extract sparse, distinctive personal features from raw, complex user history and often default to generic training data behaviors.

Why it matters:

Directly feeding raw personal context (PC) to LLMs is often ineffective due to context length limits and the sparsity of key signals
LLMs prioritize imitating general training sets over specific user styles, failing to capture nuanced preferences without explicit steering
Reinforcement learning (RLHF) for personalization is resource-intensive and lacks ground truth labels for profile generation

Concrete Example: In tweet paraphrasing, a user might use block letters for emphasis. An LLM fed the raw history misses this spatial pattern and focuses on sentiment instead, producing a generic response.

Key Novelty

Guided Profile Generation (GPG)

Introduces an intermediate 'digestion' step where the LLM answers a specific guiding question about the raw history (e.g., 'List product categories')
Uses the digestion output to generate a concise, explainable natural language profile that summarizes user habits
Feeds this synthesized profile, rather than just raw data, to the downstream model to steer generation

Architecture

The GPG workflow demonstrating the three-step process: Context Digestion, Profile Generation, and Response Generation.

Evaluation Highlights

Increases accuracy by 37% in Amazon preference prediction compared to directly feeding the LLM with raw personal context
Improves METEOR score by 2.24 in Tweet paraphrasing (LAMP-7) by guiding the model to recognize specific writing features
Achieves 105.62% improvement in preference prediction over the no-context baseline by using self-generated profiles

Breakthrough Assessment

5/10

A practical prompting framework that significantly boosts personalization performance without training. While not a new architecture, it effectively addresses the context utilization bottleneck.

⚙️ Technical Details

Problem Definition

Setting: Personalized generation where a raw personal context (PC) and task (T) are given to generate a user-aligned response

Inputs: Personal Context (PC) and Task Query (Q)

Outputs: Personalized Response (e.g., predicted product, paraphrased tweet)

Pipeline Flow

Personal Context Digestion: Ask specific questions to extract key features from raw context
Guided Profile Generation: Generate a natural language profile using the digested answers as guidance
Response Generation: Use the generated profile (and optionally raw context) to answer the user query

System Modules

Context Digester

Extract specific features from raw history to provide direction

Model or implementation: gpt-3.5-turbo-1106

Profile Generator

Synthesize a descriptive natural language profile based on the guidance

Model or implementation: gpt-3.5-turbo-1106

Response Generator

Generate the final personalized response

Model or implementation: gpt-3.5-turbo-1106

Modeling

Base Model: gpt-3.5-turbo-1106

Compute: Inference-only (no training reported). Uses greedy decoding (temperature=0) and max_tokens=100.

Comparison to Prior Work

vs. LAMPSalemi: Generates a synthesis (profile) rather than just retrieving raw chunks, addressing the issue that retrievers often miss subtle stylistic cues
vs. PALRChen: Focuses on generating readable, explanatory natural language profiles specifically guided by intermediate questions, rather than just general summaries
vs. Direct Generation: Adds an explicit intermediate reasoning step (profile generation) to 'digest' context before answering

Limitations

Intermediate profile generation adds latency and cost due to multiple LLM calls
Relies on the quality of the 'guiding question' designed for each specific task
Evaluated only on gpt-3.5-turbo; scaling effects on stronger models (GPT-4) not explored

Reproducibility

No replication artifacts mentioned in the paper. The paper uses standard OpenAI API (gpt-3.5-turbo-1106). Evaluation datasets (Amazon, LAMP-7, PER-CHAT) are public.

📊 Experiments & Results

Evaluation Setup

Personalized text generation and prediction across three domains

Benchmarks:

Amazon Product Review (Preference Prediction (Multiple Choice)) [New]
LAMP-7 (Text Paraphrasing (Tweet style imitation))
PER-CHAT (Dialogue Response Generation)

Metrics:

Accuracy
METEOR
BLEU
ROUGE
BERT-Score
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Guided Profile Generation (GPG) improves accuracy by 37% on preference prediction compared to directly feeding raw context, showing that LLMs benefit from 'digested' summaries.
In style imitation (Tweets), GPG improves METEOR by 2.24, validating that explicit guidance helps the model capture nuanced stylistic features like capitalization or emojis.
Raw personal context is still useful: adding it to the final input along with the generated profile often yields the best results compared to profile-only or context-only.

📚 Prerequisite Knowledge

Prerequisites

Prompt Engineering
In-context Learning
Basics of Recommender Systems

Key Terms

GPG: Guided Profile Generation—the proposed method of creating an intermediate natural language profile to steer LLM personalization

PC: Personal Context—raw historical data associated with a user, such as purchase history or past tweets

METEOR: A metric for evaluating text generation (like translation or paraphrasing) that correlates well with human judgment

RAG: Retrieval-Augmented Generation—retrieving relevant documents to provide context for generation

RLHF: Reinforcement Learning from Human Feedback—training models using human preferences as a reward signal

LAMP-7: A dataset for evaluating personalized tweet paraphrasing

Zero-shot: Asking the model to perform a task without providing any examples