Evaluation Setup
Federated Learning with statistical heterogeneity simulated by Dirichlet distribution (alpha=0.1, 0.5) over classes.
Benchmarks:
- CIFAR-10 (Image Classification)
- CIFAR-100 (Image Classification)
- Fashion-MNIST (Image Classification)
- Tiny-ImageNet (Image Classification)
Metrics:
- Test Accuracy (%)
- Minimum Description Length (MDL) in bits
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Comparison with SOTA Personalized FL methods on CIFAR-100 (alpha=0.1) using ResNet-18. |
| CIFAR-100 (alpha=0.1) |
Accuracy (%) |
57.70 |
69.06 |
+11.36
|
| CIFAR-100 (alpha=0.1) |
Accuracy (%) |
54.67 |
69.06 |
+14.39
|
| Improvement over traditional FL methods (Generalization Ability) on CIFAR-10 (alpha=0.1). |
| CIFAR-10 (alpha=0.1) |
Accuracy (%) |
51.18 |
83.48 |
+32.30
|
| CIFAR-10 (alpha=0.1) |
MDL (bits) |
56.41 |
34.06 |
-22.35
|
| Plug-and-play capability: DBE improving other FL baselines. |
| CIFAR-100 (alpha=0.1) |
Accuracy (%) |
56.97 |
68.61 |
+11.64
|
| CIFAR-100 (alpha=0.1) |
Accuracy (%) |
56.49 |
68.49 |
+12.00
|
Main Takeaways
- DBE significantly improves both personalization (local accuracy) and generalization (global representation quality) across all tested datasets.
- The combination of PRBM and MR effectively decouples bias from generic features, as evidenced by the substantial reduction in MDL scores.
- DBE is highly compatible as a plug-and-play module, boosting the performance of various existing FL algorithms like FedProx, MOON, and FedGen.
- Performance gains are most pronounced in highly heterogeneous settings (e.g., alpha=0.1), validating the method's core premise of handling domain bias.