← Back to Paper List

PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization

Christopher Clarke, Yuzhao Heng, Lingjia Tang, Jason Mars
University of Michigan
arXiv.org (2024)
P13N Benchmark

📝 Paper Summary

Personalized LLMs Subjective NLP Tasks Parameter-Efficient Fine-Tuning (PEFT)
The paper introduces PEFT-U, a benchmark for subjective tasks where users disagree on identical inputs, and demonstrates that parameter-efficient fine-tuning outperforms prompting for modeling these individual perspectives.
Core Problem
LLMs typically employ a 'one-size-fits-all' approach that aggregates user data into a single ground truth, failing to accommodate subjective tasks where different users validly hold conflicting labels for the exact same input.
Why it matters:
  • Subjective applications like Hate Speech Detection and Humor Analysis depend entirely on individual user perspective, which generalized models ignore by favoring majority voting
  • Deploying separate full fine-tuned models for every user is computationally prohibitive in production environments
  • Existing benchmarks usually discard disagreement as 'noise', thereby removing the very signals needed to train personalized models
Concrete Example: In the HateXplain dataset, one user might label the phrase 'right definitely not going back to the fag hag thing' as 'normal', while another user labels the exact same text as 'offensive'. A standard LLM trained on majority vote would force a single label, ignoring the specific user's context.
Key Novelty
PEFT-U Benchmark & Evaluation Framework
  • Reconstructs 13+ NLP datasets by treating individual annotators as distinct users, specifically filtering for tasks with low inter-annotator agreement (Krippendorff’s alpha ≤ 0.5) to ensure personalization is required
  • Comparative analysis of 'Parametric' personalization (updating specific weights via Adapters/LoRA per user) versus 'Non-Parametric' personalization (prompting with user examples)
Evaluation Highlights
  • Adapters achieved the highest overall accuracy of 64.4% across 13 personalized tasks, outperforming LoRA (59.5%)
  • Adapters outperformed other methods on 12 out of the 13 PEFT-U tasks
  • Personalized fine-tuning methods consistently outperformed Zero-shot and Few-shot prompting baselines on average
Breakthrough Assessment
7/10
Significant contribution in benchmarking subjective tasks where 'ground truth' varies by user. The finding that Adapters outperform LoRA in this specific setting is a useful empirical insight for personalization.
×