pFL: Personalized Federated Learning—techniques to adapt the global FL model to individual client data distributions.
Non-IID: Non-Independent and Identically Distributed—data on different clients follows different statistical distributions.
Model Decoupling: Splitting a neural network into a feature extractor (lower layers) and a classifier (final layers) to treat them differently during training/aggregation.
CKA: Centered Kernel Alignment—a similarity index for comparing representations of neural network layers.
Generalization Phase: An initial warm-up period running standard FedAvg to get a reasonable base model before personalization begins.
Dirichlet distribution: A probability distribution used here to partition data among clients to simulate varying degrees of non-IID heterogeneity (controlled by parameter $\alpha$).
Feature Extractor: The initial layers of a network (e.g., convolutional layers) that transform raw inputs into latent representations.
Classifier: The final layers of a network (e.g., fully connected layers) that map latent representations to output class probabilities.