Eliminating Domain Bias for Federated Learning in Representation Space

📝 Paper Summary

Personalized Federated Learning Representation Learning

DBE addresses representation bias in federated learning by separating client-specific bias into a local memory while regularizing feature extractors toward a global consensual mean.

Core Problem

In statistically heterogeneous federated learning, local training on biased data domains causes the global model to learn biased representations (representation bias) and lose generic representation quality (representation degeneration).

Why it matters:

Standard FedAvg suffers accuracy drops when data is non-IID because local updates pull the model toward local biases, damaging its generalization.
Existing personalized FL methods often keep classifiers local but fail to correct the feature extractor, which still learns biased features from skewed local data.
The conflict between learning client-specific features for personalization and client-invariant features for global aggregation hinders effective collaboration.

Concrete Example: A client might only have images of 'dogs' and 'cats' but not 'birds'. Training locally makes the feature extractor cluster dog/cat features tightly but degrade the representation space for 'birds', causing the global model to perform poorly on 'birds' for other clients.

Key Novelty

Domain Bias Eliminator (DBE)

Decouples feature representation into two parts: a client-invariant 'global' representation and a client-specific 'bias' term stored locally.
Uses a Personalized Representation Bias Memory (PRBM) to store the offset (bias) for each client, allowing the feature extractor to focus on generic features.
Applies Mean Regularization (MR) to force the local feature extractor's output mean to align with a global consensus mean, preventing drift into local clusters.

Architecture

Comparison of local training processes between traditional FL/pFL and the proposed DBE framework.

Evaluation Highlights

Outperforms state-of-the-art personalized FL methods by up to +11.36% accuracy on CIFAR-100 (heterogeneous setting).
Improves standard FedAvg by +32.30% accuracy and reduces representation complexity (MDL) by -22.35 bits in heterogeneous scenarios.
Consistently enhances multiple FL baselines (FedProx, MOON, FedGen) when integrated as a plug-and-play module.

Breakthrough Assessment

8/10

Offers a fundamental structural improvement (bias decoupling) for FL that is model-agnostic and yields large gains over strong baselines. The theoretical grounding in generalization bounds adds significant weight.

⚙️ Technical Details

Problem Definition

Setting: Multi-class classification in a Federated Learning setting with N clients, where each client i has a private, biased data domain D_i (non-IID data distribution).

Inputs: Local dataset D_i containing input samples x and labels y.

Outputs: A global model g (feature extractor f + classifier h) and personalized bias parameters for each client.

Pipeline Flow

Local Training: Feature Extractor → PRBM (adds bias) → Classifier
Mean Regularization: Regularize Feature Extractor output toward Global Mean
Server Aggregation: Aggregate Feature Extractor and Global Mean; PRBM stays local

System Modules

Feature Extractor (f)

Maps input x to a 'global' representation z_g intended to be client-invariant

Model or implementation: CNN (e.g., simple 2-conv layer network) or ResNet-18/MobileNetV2 depending on dataset

PRBM (Personalized Representation Bias Memory)

Stores client-specific bias vector z_p; acts as a translation transformation

Model or implementation: Trainable vector parameter (size K, dimension of feature space)

Mean Regularization (MR)

Loss term to align local feature mean with global consensus mean

Model or implementation: MSE Loss component

Classifier (h)

Maps personalized representation z to class probabilities

Model or implementation: Fully Connected Layer(s)

Novel Architectural Elements

Split representation level: Decomposes the standard latent vector into a global component (output of conv net) and a local component (PRBM vector).
Dual-path optimization: Optimizes global model for generic features via MR while optimizing local bias memory for personalization.

Modeling

Base Model: SimpleCNN (2 conv layers) for CIFAR-10/Fashion-MNIST; ResNet-18/MobileNetV2 for CIFAR-100/Tiny-ImageNet

Training Method: Federated Learning with SGD on clients

Objective Functions:

Purpose: Minimize classification error using personalized representations.

Formally: L_CE = CrossEntropy(h(f(x) + z_p), y)
Purpose: Regularize feature extractor to align with global mean.

Formally: L_MR = κ * || z_bar_g_local - z_bar_g_global ||^2
Purpose: Total Local Loss.

Formally: L_total = L_CE + L_MR

Key Hyperparameters:

learning_rate: 0.01
batch_size: 32
local_epochs: 5
+ 4 more
optimizer: SGD
weight_decay: 1e-5
momentum: 0.9
kappa (MR weight): 0.1 to 2.0 (varies by dataset, typically 1.0 or 2.0)

Compute: Not reported in the paper

Comparison to Prior Work

vs. FedPer/FedRep: These methods leave the feature extractor unguided, leading to biased features; DBE explicitly regularizes the extractor with MR and PRBM.
vs. FedGen: FedGen requires a generator on the server and extra communication; DBE is lightweight with only a mean vector communicated.
vs. Ditto [not cited in paper]: Ditto uses multi-task learning with a regularization term to keep local models close to global; DBE structurally separates bias rather than just constraining weights.

Limitations

The global mean calculation requires an extra round of communication or initialization before training starts.
The hyperparameter kappa for Mean Regularization needs tuning for different datasets.
Assumes all clients share the same model architecture (feature extractor).

Reproducibility

Code: https://github.com/TsingZ0/DBE

Code is publicly available at https://github.com/TsingZ0/DBE. Datasets (CIFAR-10, CIFAR-100, Fashion-MNIST, Tiny-ImageNet) are standard public benchmarks. Hyperparameters are detailed in the paper.

📊 Experiments & Results

Evaluation Setup

Federated Learning with statistical heterogeneity simulated by Dirichlet distribution (alpha=0.1, 0.5) over classes.

Benchmarks:

CIFAR-10 (Image Classification)
CIFAR-100 (Image Classification)
Fashion-MNIST (Image Classification)
Tiny-ImageNet (Image Classification)

Metrics:

Test Accuracy (%)
Minimum Description Length (MDL) in bits
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparison with SOTA Personalized FL methods on CIFAR-100 (alpha=0.1) using ResNet-18.
CIFAR-100 (alpha=0.1)	Accuracy (%)	57.70	69.06	+11.36
CIFAR-100 (alpha=0.1)	Accuracy (%)	54.67	69.06	+14.39
Improvement over traditional FL methods (Generalization Ability) on CIFAR-10 (alpha=0.1).
CIFAR-10 (alpha=0.1)	Accuracy (%)	51.18	83.48	+32.30
CIFAR-10 (alpha=0.1)	MDL (bits)	56.41	34.06	-22.35
Plug-and-play capability: DBE improving other FL baselines.
CIFAR-100 (alpha=0.1)	Accuracy (%)	56.97	68.61	+11.64
CIFAR-100 (alpha=0.1)	Accuracy (%)	56.49	68.49	+12.00

Experiment Figures

Visualization of representation bias and degeneration in FedAvg.

t-SNE visualization of representations with and without DBE on CIFAR-10.

Main Takeaways

DBE significantly improves both personalization (local accuracy) and generalization (global representation quality) across all tested datasets.
The combination of PRBM and MR effectively decouples bias from generic features, as evidenced by the substantial reduction in MDL scores.
DBE is highly compatible as a plug-and-play module, boosting the performance of various existing FL algorithms like FedProx, MOON, and FedGen.
Performance gains are most pronounced in highly heterogeneous settings (e.g., alpha=0.1), validating the method's core premise of handling domain bias.

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FedAvg algorithm)
Feature Representation Learning
Statistical Heterogeneity (Non-IID data)

Key Terms

MDL: Minimum Description Length—a metric measuring the complexity or 'length' of encoding data given a model; lower values indicate better generalization/representation quality.

representation bias: The phenomenon where a model trained on skewed local data forms clusters specific to that client's domain rather than generic features useful for all.

representation degeneration: The decrease in the quality of generic representations (measured by MDL) due to training on datasets with missing labels or classes.

PRBM: Personalized Representation Bias Memory—a local trainable vector that stores the client-specific bias, separated from the feature extractor's output.

MR: Mean Regularization—a loss term that encourages the mean of local feature representations to align with a global average calculated across all clients.

FedAvg: Federated Averaging—the standard algorithm for FL where client model weights are averaged by a central server.

pFL: Personalized Federated Learning—FL variants that learn distinct models for each client to handle heterogeneity.