← Back to Paper List

Comparative Personalization for Multi-document Summarization

Haoyuan Li, Snigdha Chaturvedi
University of North Carolina at Chapel Hill
arXiv (2025)
P13N Benchmark Factuality RAG

📝 Paper Summary

Personalized Multi-document Summarization (MDS) User Modeling
ComPSum improves personalized summarization by explicitly comparing a user's past documents with those of other users to identify distinctive preferences, while AuthorMap provides a reference-free evaluation method based on authorship attribution.
Core Problem
Existing personalized text generation methods rely on general user profile summaries or retrieval, failing to capture fine-grained differences in writing style and content focus that distinguish one user from another.
Why it matters:
  • Users have conflicting preferences (e.g., formal vs. conversational tone, focus on price vs. quality), making generic summaries unsatisfactory.
  • Evaluating personalization is difficult due to the lack of reference summaries for new inputs; standard metrics like ROUGE cannot measure how well a summary matches a user's specific style.
  • Current datasets for personalized Multi-document Summarization (MDS) are limited, particularly lacking in the news domain with user labels.
Concrete Example: In product reviews, User A might focus strictly on durability/price with a formal tone, while User B focuses on aesthetics with a casual tone. Standard methods might retrieve User A's past reviews but fail to highlight *how* A differs from B, resulting in a summary that is generically 'review-like' rather than specifically 'User A-like'.
Key Novelty
Comparative Personalization (ComPSum)
  • Generates a structured user analysis by retrieving a user's profile document and a 'comparative document' (same topic, different author) to explicitly contrast differences.
  • Uses this comparative analysis to guide an LLM in generating summaries that mimic the specific user's writing style and content focus.
  • Introduces AuthorMap, an evaluation framework that checks if an LLM can correctly identify the author of a profile given two generated summaries (authorship attribution as a proxy for personalization quality).
Evaluation Highlights
  • ComPSum achieves the highest overall scores (averaging personalization, factuality, and relevance) across all tested LLMs (Llama-3.1-8B, Qwen2.5-14B, Llama-3.3-70B).
  • On the PerMSum news dataset, AuthorMap evaluation shows ComPSum outperforms the RAG baseline by +11.8 points in Writing Style accuracy using Llama-3.1-8B.
  • In human evaluation, AuthorMap aligns well with human judgments, achieving 80% agreement for writing style in news and content focus in reviews.
Breakthrough Assessment
7/10
Introduces a clever 'comparative' approach to profiling that improves distinctiveness, plus a valuable reference-free evaluation metric and a new dataset. However, relies heavily on prompting existing LLMs rather than novel architectural components.
×