MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

📝 Paper Summary

Federated Learning (FL) Non-I.I.D. Data Model Aggregation Personalized Federated Learning

MAP combines a restricted softmax for better global aggregation and inherited private models for better personalization to handle federated learning scenarios where clients possess only a subset of all possible classes.

Core Problem

In federated learning with incomplete classes (where clients only see a subset of total classes), standard aggregation suffers because missing classes degrade the global model, while standard personalization discards valuable historical local knowledge.

Why it matters:

Real-world FL clients often lack data for specific classes (e.g., a user hasn't downloaded all apps, or a hospital sees only certain diseases), creating severe Non-I.I.D. challenges.
Existing methods typically optimize either global aggregation or local personalization, but rarely both simultaneously under extreme label skew.
Standard softmax pulls weights of missing classes towards negative infinity during local updates, damaging the global model's ability to recognize those classes.

Concrete Example: In a 10-class classification task, if Client A only sees classes 1-5, standard training pushes the weights for classes 6-10 (missing classes) to extremely small values to minimize their probability. When aggregated, this degrades the global model's performance on classes 6-10, even if other clients have data for them.

Key Novelty

MAP (Model Aggregation and Personalization)

Improves aggregation by using 'Restricted Softmax' (RS), which prevents the weights of missing classes from being pushed to negative values during local training, maintaining their validity for global aggregation.
Improves personalization by using 'Inherited Private Model' (HPM), which essentially creates an ensemble of historical personalized models to supervise the current round's local training, preventing catastrophic forgetting of personal preferences.

Evaluation Highlights

+3.43% to +11.66% improvement in global aggregation accuracy over FedAvg across CIFAR-10, CIFAR-100, and CINIC-10 benchmarks.
+4.04% to +14.67% improvement in personalization accuracy over FedAvg on the same benchmarks.
Outperforms state-of-the-art methods like FedROD and FedRS in both aggregation and personalization metrics simultaneously.

Breakthrough Assessment

7/10

Offers a solid, mathematically motivated solution to a specific but common FL problem (incomplete classes). Effectively combines two previous techniques (FedRS and FedPHP) to solve the dual objective problem.

⚙️ Technical Details

Problem Definition

Setting: Federated Learning with incomplete classes (extreme label distribution skew)

Inputs: Distributed client datasets D^k where P^k(y=c)=0 for missing classes c

Outputs: A global model maximizing accuracy on all classes C, and k personalized models maximizing accuracy on client-specific observed classes O^k

Pipeline Flow

Server broadcasts global model
Client Initialization: Client receives global model w_g
Personalization Phase (with HPM): Client updates w_g using local data + distillation from Inherited Private Model (HPM)
Aggregation Phase (with RS): Client further updates model using Restricted Softmax loss to prepare for upload
HPM Update: Client updates its HPM using a moving average of the personalized model
Upload & Aggregate: Client uploads aggregation-ready model; Server averages them

System Modules

Restricted Softmax (RS)

Modifies the standard cross-entropy loss to exclude missing classes from the softmax normalization term

Model or implementation: Softmax modification

Inherited Private Model (HPM)

Stores historical personalization knowledge to supervise current local training via knowledge distillation

Model or implementation: Moving average of past local models

Novel Architectural Elements

Dual-phase local training loop: one phase for personalization (using HPM), one for aggregation (using RS)
Decoupled models: The model uploaded to the server is different from the model kept locally for personalization

Modeling

Base Model: ResNet-20 (for CIFAR-10), ResNet-18 (for CIFAR-100/CINIC-10)

Training Method: Federated Learning (iterative local training and global aggregation)

Objective Functions:

Purpose: Personalization update.

Formally: L_per = L_CE(w) + lambda * L_KL(w, HPM)
Purpose: Aggregation update.

Formally: L_agg = L_RS(w) (Restricted Softmax Loss)

Training Data:

CIFAR-10: 10 classes, 50k train / 10k test
CIFAR-100: 100 classes, 50k train / 10k test
CINIC-10: 10 classes, 90k train / 90k test

Key Hyperparameters:

learning_rate: 0.01
batch_size: 50
local_epochs: 5
+ 3 more
HPM_update_momentum_alpha: 0.9
distillation_weight_lambda: 0.1
communication_rounds: 200 (CIFAR-10) / 500 (others)

Comparison to Prior Work

vs. FedRS: MAP adds the HPM component to improve personalization, which FedRS lacks.
vs. FedPHP: MAP adds the RS component to improve aggregation, which FedPHP lacks.
vs. FedROD: MAP explicitly addresses the 'missing class' scenario with Restricted Softmax rather than just generic re-weighting, and uses historical ensembling for personalization.

Limitations

Assumes the set of incomplete classes is known or detectable.
Requires maintaining a separate HPM model locally, doubling local storage memory for model parameters.
Evaluated primarily on image classification tasks; applicability to NLP or other modalities not tested.
Requires two separate forward/backward passes (or phases) during local training, increasing local computation.

Reproducibility

Code is not provided in the paper. Detailed hyperparameters and data partition settings (Dirichlet distribution alpha=0.5, missing class ratio 0.1 to 0.5) are described.

📊 Experiments & Results

Evaluation Setup

Simulated FL with incomplete classes using Dirichlet distribution to partition data.

Benchmarks:

CIFAR-10 (Image Classification)
CIFAR-100 (Image Classification)
CINIC-10 (Image Classification)

Metrics:

Global Aggregation Accuracy (test accuracy of global model on all classes)
Personalization Accuracy (average test accuracy of local models on local observed classes)

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Global Aggregation Accuracy comparisons on CIFAR-10 show MAP outperforming all baselines.
CIFAR-10	Aggregation Accuracy	78.25	89.91	+11.66
CIFAR-10	Aggregation Accuracy	86.32	89.91	+3.59
Personalization Accuracy comparisons on CIFAR-10 show MAP achieving superior local performance.
CIFAR-10	Personalization Accuracy	81.65	96.32	+14.67
CIFAR-10	Personalization Accuracy	94.38	96.32	+1.94
Ablation studies demonstrate the contribution of RS and HPM components individually.
CIFAR-10	Aggregation Accuracy	81.98	89.91	+7.93
CIFAR-10	Personalization Accuracy	93.45	96.32	+2.87

Main Takeaways

MAP significantly outperforms FedAvg and even specialized baselines like FedROD in scenarios with high missing class rates (e.g., 0.5).
Restricted Softmax (RS) is the primary driver for aggregation performance, preventing the 'weight divergence' caused by missing classes.
Inherited Private Models (HPM) are key for personalization, allowing clients to retain local knowledge that might be overwritten by global updates.
The method is robust to different levels of data heterogeneity (Dirichlet alpha) and missing class ratios.

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FedAvg algorithm)
Softmax cross-entropy loss
Knowledge Distillation (for the HPM component)
Non-I.I.D. data challenges in FL

Key Terms

RS: Restricted Softmax—a modified softmax that ignores missing classes in the denominator to prevent their weights from being negatively updated.

HPM: Inherited Private Model—a temporal ensemble of a client's past personalized models used to supervise current training.

Incomplete Classes: A scenario where a local client's dataset contains samples from only a subset of the total global class set.

Label Distribution Skew: A type of Non-I.I.D. data where the distribution of labels varies across clients.

FedAvg: Federated Averaging—the standard algorithm for FL where client updates are averaged to form a global model.