Participatory Personalization in Classification

📝 Paper Summary

Informed Consent in AI Group-based Personalization

Participatory systems allow users to opt into personalization at prediction time only if it benefits them, resolving trade-offs between privacy, data collection, and model performance.

Core Problem

Standard personalized models force users to provide sensitive data without consent or guaranteed benefit, often leading to 'worsenalization' where providing data actually degrades performance for certain groups.

Why it matters:

Individuals lack agency to opt out of reporting sensitive data (e.g., HIV status, income) to models.
Providing personal data does not always improve predictions; for some groups, generic models outperform personalized ones due to noise or sample size.
Current systems violate the principle of collection limitation by gathering data that does not necessarily improve outcomes.

Concrete Example: In a stroke risk task, a standard personalized model might require 'age' and 'gender'. For an 'old female' group, the personalized model might have higher error (24) than a generic model (0) trained without those features. A participatory system would let this group opt out, receiving the better generic prediction while saving data.

Key Novelty

Participatory Systems with Reporting Interfaces

Replaces a single static model with a system of models accessible via a 'reporting interface' (a decision tree of questions).
Treats inference as a market: users trade personal information for performance gains, only opting in when the personalized model provably outperforms the baseline.
Guarantees 'incentive compatibility' (opting in improves expected accuracy) and 'baseline performance' (opting out never performs worse than a generic model).

Evaluation Highlights

Participatory systems reduce error by up to 2.2% compared to standard personalization on the ACS Income dataset while requesting 60% less data.
Eliminates 'worsenalization' (negative gains from personalization) across all 6 clinical datasets tested; standard personalization harmed performance for 33% of groups on average.
Outperforms imputation baselines (e.g., MICE) by preventing performance degradation for groups where missingness would otherwise hurt accuracy.

Breakthrough Assessment

8/10

Strong conceptual contribution aligning ML with privacy/consent principles. Mathematically formalizes 'informed consent' in inference. Practical gains are consistent, though the method adds complexity to deployment.

⚙️ Technical Details

Problem Definition

Setting: Classification with optional categorical group attributes G available at prediction time.

Inputs: Feature vector x_i and a subset of reported group attributes r_i (chosen by the user from available options R).

Outputs: Predicted label y_i.

Pipeline Flow

Reporting Interface (Tree Traversal)
User Decision (Opt-in/Opt-out)
Model Selection
Inference

System Modules

Reporting Interface

Present users with options to report specific group attributes (e.g., 'Report Age?', 'Report Sex?') or opt out.

Model or implementation: Tree structure T where nodes are reporting states

Inference Model Dictionary

Map the reported attribute set r to a specific pre-trained classifier.

Model or implementation: Set of classifiers {f_r} for each valid reporting state r

Novel Architectural Elements

The 'Reporting Interface' as a structural component of the inference pipeline, allowing dynamic model selection based on user consent.
Recursive partitioning algorithm (Greedy-Split) adapted to construct the reporting tree by maximizing population-level risk reduction.

Modeling

Base Model: XGBoost (used as the base classifier for all nodes in the experiments)

Training Method: Recursive partitioning (Greedy-Split algorithm) to build the reporting tree, training XGBoost classifiers at each node.

Objective Functions:

Purpose: Select the best attribute to split on in the reporting tree.

Formally: Maximize Gain(r, j) = R(f_T(.,r)) - [P(g_j != 0|r)R(f_new(., r U j)) + P(g_j = 0|r)R(f_T(.,r))].

Training Data:

Data split: 50% training, 25% validation, 25% testing
Datasets: ACS Income, ACSEmployment, HMDA Mortgage, heart_failure, stroke_prediction, diabetes

Key Hyperparameters:

max_depth: varies (2, 3, 5)
min_samples_leaf: 50 (to estimate risk)
xgboost_estimators: 100
+ 1 more
xgboost_max_depth: 3

Compute: Not reported in the paper

Comparison to Prior Work

vs. Standard Personalization: Participatory systems allow selective reporting; Standard forces reporting and may degrade performance (worsenalization).
vs. Imputation: Participatory systems treat 'missingness' as a choice to use a coarser model, rather than hallucinating data.
vs. DRO (Distributionally Robust Optimization) [not cited in paper]: DRO optimizes for worst-case groups assuming data is present; Participatory systems optimize for best-case data availability given consent.
+ 1 more
vs. Selective Classification [not cited in paper]: Selective classification allows the model to abstain; Participatory systems allow the *user* to abstain from providing features.

Limitations

Assumes users are rational agents who will opt-in if shown a performance gain (Incentive Compatibility assumption).
Requires sufficient data for every subgroup in the reporting tree to reliably estimate risk and train specific models.
Recursive splitting reduces effective sample size for deep nodes in the reporting tree.
Does not address privacy attacks (e.g., membership inference) beyond the consent mechanism itself.

Reproducibility

Code: https://github.com/haileyjoren/participatory-personalization

Code is publicly available. The paper uses public datasets (ACS, HMDA, various clinical datasets). Hyperparameters for the base XGBoost models and the greedy splitting algorithm are specified.

📊 Experiments & Results

Evaluation Setup

Tabular classification on clinical and socioeconomic datasets.

Benchmarks:

ACS Income (Income prediction (binary))
HMDA Mortgage (Mortgage approval prediction)
Clinical Datasets (Heart, Stroke, Diabetes, etc.) (Disease risk prediction)

Metrics:

Test Error (Accuracy/Risk)
Data Use (Average number of attributes reported per user)
Percentage of groups suffering worsenalization
Statistical methodology: Bootstrap resampling (100 trials) to generate confidence intervals (shaded regions in plots).

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Average across 6 clinical datasets	% Groups Worsenalized	33.0	0.0	-33.0
ACS Income (CA)	Error Rate	0.225	0.220	-0.005
ACS Income (CA)	Attributes Reported	2.0	0.8	-1.2
Heart Failure	Error Rate	0.18	0.16	-0.02

Experiment Figures

Pareto plots of Test Error (y-axis) vs. Average attributes reported (x-axis) for ACS Income across different states.

Comparison of 'Worsenalization' (negative gains) across clinical datasets.

Main Takeaways

Participatory systems consistently lie on the Pareto frontier of Accuracy vs. Data Use, often outperforming methods that use all data by selectively discarding 'harmful' personalization.
The 'Sequential' interface type typically offers the best trade-off, querying only the most predictive attributes first.
Imputation methods (Mean, MICE) often fail to recover the performance of true data and can perform worse than simple generic models in this context.
The approach is robust to the number of available group attributes, showing gains even with just 2-3 sensitive features.

📚 Prerequisite Knowledge

Prerequisites

Supervised classification
Basic probability (conditional risk)
Decision trees (for the reporting interface structure)

Key Terms

worsenalization: A phenomenon where a personalized model performs worse for a specific group than a generic model trained without group attributes

participatory system: A classification system that allows individuals to choose which personal attributes to report at inference time based on expected performance gains

reporting interface: A tree structure defining the sequence of questions (attributes) a user can choose to answer or skip (opt-out)

incentive compatibility: The property that reporting more information (opting in) strictly improves expected model performance

baseline performance: The property that opting out entirely guarantees performance at least as good as a generic model trained without any group attributes

MICE: Multiple Imputation by Chained Equations—a statistical method for handling missing data by creating multiple plausible imputed datasets

ERM: Empirical Risk Minimization—a principle in statistical learning theory where the model is chosen to minimize the average loss on the training dataset