MHPFL: Model-Heterogeneous Personalized Federated Learning—FL where clients have different model architectures (e.g., ResNet vs. MobileNet) but collaborate to improve performance.
MoE: Mixture of Experts—A neural network architecture where different sub-models (experts) are activated by a gating network for different inputs.
non-IID: Non-Independent and Identically Distributed—Data distributions vary across clients (e.g., one client has only cats, another only dogs).
Gating Network: A small neural network that outputs probability weights to mix the outputs of different expert models.
Feature Extractor: The initial layers of a neural network that map raw input (e.g., pixels) to a latent vector representation.
FedAvg: Federated Averaging—The standard algorithm for FL where local model weights are averaged by a central server.
Knowledge Distillation: Training a student model to mimic the output (logits/features) of a teacher model.