← Back to Paper List

Participatory Personalization in Classification

Hailey Joren, Chirag Nagpal, Katherine Heller, Berk Ustun
University of California San Diego, Google
arXiv (2023)
P13N Benchmark

📝 Paper Summary

Informed Consent in AI Group-based Personalization
Participatory systems allow users to opt into personalization at prediction time only if it benefits them, resolving trade-offs between privacy, data collection, and model performance.
Core Problem
Standard personalized models force users to provide sensitive data without consent or guaranteed benefit, often leading to 'worsenalization' where providing data actually degrades performance for certain groups.
Why it matters:
  • Individuals lack agency to opt out of reporting sensitive data (e.g., HIV status, income) to models.
  • Providing personal data does not always improve predictions; for some groups, generic models outperform personalized ones due to noise or sample size.
  • Current systems violate the principle of collection limitation by gathering data that does not necessarily improve outcomes.
Concrete Example: In a stroke risk task, a standard personalized model might require 'age' and 'gender'. For an 'old female' group, the personalized model might have higher error (24) than a generic model (0) trained without those features. A participatory system would let this group opt out, receiving the better generic prediction while saving data.
Key Novelty
Participatory Systems with Reporting Interfaces
  • Replaces a single static model with a system of models accessible via a 'reporting interface' (a decision tree of questions).
  • Treats inference as a market: users trade personal information for performance gains, only opting in when the personalized model provably outperforms the baseline.
  • Guarantees 'incentive compatibility' (opting in improves expected accuracy) and 'baseline performance' (opting out never performs worse than a generic model).
Evaluation Highlights
  • Participatory systems reduce error by up to 2.2% compared to standard personalization on the ACS Income dataset while requesting 60% less data.
  • Eliminates 'worsenalization' (negative gains from personalization) across all 6 clinical datasets tested; standard personalization harmed performance for 33% of groups on average.
  • Outperforms imputation baselines (e.g., MICE) by preventing performance degradation for groups where missingness would otherwise hurt accuracy.
Breakthrough Assessment
8/10
Strong conceptual contribution aligning ML with privacy/consent principles. Mathematically formalizes 'informed consent' in inference. Practical gains are consistent, though the method adds complexity to deployment.
×