← Back to Paper List

FedGH: Heterogeneous Federated Learning with Generalized Global Header

Liping Yi, Gang Wang, Xiaoguang Liu, Zhuan Shi, Han Yu
Nankai University, University of Science and Technology of China, Nanyang Technological University
ACM Multimedia (2023)
P13N MM

📝 Paper Summary

Federated Learning Model Heterogeneity
FedGH enables diverse devices to collaborate in federated learning by training a shared global prediction header on class-averaged representations, decoupling it from heterogeneous local feature extractors.
Core Problem
Standard Federated Learning requires all clients to use identical model architectures (model homogeneity), excluding resource-constrained devices and failing to adapt to diverse local data distributions.
Why it matters:
  • Low-end edge devices cannot train the large models used by high-end servers, preventing them from participating in collaborative learning
  • Data on devices is Non-IID (not independently and identically distributed), meaning a single global model often performs poorly on local personalized data
  • Existing heterogeneous solutions often rely on public datasets (unavailable in practice) or heavy knowledge distillation that incurs high computation/communication costs
Concrete Example: In a visual classification task, a smartwatch can only run a tiny 3-layer CNN, while a powerful server runs ResNet-18. FedAvg fails because the model weights cannot be averaged. FedGH allows collaboration by sharing only the prediction header and representations, not the full architecture.
Key Novelty
Generalized Global Prediction Header Training via Local Averaged Representations (LARs)
  • Decouples the model into a personalized, heterogeneous feature extractor (kept local) and a homogeneous prediction header (shared)
  • Clients compute 'prototypes' of their data (average representation per class) and send these to the server instead of model weights or raw data
  • The server trains the shared header on these lightweight representations and broadcasts it back, replacing the client's local header to inject global knowledge
Architecture
Architecture Figure Figure 1
The FedGH workflow, illustrating the separation of local heterogeneous extractors and the shared global header.
Evaluation Highlights
  • Outperforms state-of-the-art FedProto by +1.33% accuracy on CIFAR-100 in model-heterogeneous settings
  • Achieves significantly higher accuracy in homogeneous settings (+11.11% vs FedProto on CIFAR-100, N=10)
  • Reduces communication overhead by 85.53% compared to the best performing baseline (FedProto) on CIFAR-100 while achieving higher accuracy
Breakthrough Assessment
8/10
Offers a simple yet highly effective solution to model heterogeneity that beats complex distillation methods in accuracy and efficiency without needing public data.
×