FedLECC: Cluster- and Loss-Guided Client Selection for Federated Learning under Non-IID Data

📝 Paper Summary

Federated Learning (FL) Edge Intelligence

FedLECC improves federated learning under non-IID data by clustering clients based on label distributions and prioritizing those with high local loss to select informative yet diverse updates.

Core Problem

In cross-device FL, non-IID data (specifically label skew) causes client updates to diverge, degrading model convergence and accuracy, especially when random client selection is used.

Why it matters:

Label skew is common in cloud-edge deployments where clients capture localized events or user-specific behaviors.
Naive selection strategies waste communication resources on redundant or low-impact updates, slowing convergence.
Uniform sampling is suboptimal in heterogeneous environments where only a small fraction of clients can participate due to bandwidth constraints.

Concrete Example: Consider a predictive maintenance scenario where different edge devices monitor different machine types (disjoint labels). Randomly selecting clients might repeatedly sample devices with 'Machine Type A' data while ignoring 'Machine Type B', leading to a global model that fails to detect faults in Type B.

Key Novelty

Cluster-Aware Loss-Guided Selection

Groups clients into clusters using Hellinger distance between their label distributions to identify devices with similar data characteristics.
Selects clients by first choosing clusters with high average loss, then picking specific clients within those clusters with the highest local loss.
Jointly enforces diversity (via clustering) and informativeness (via loss prioritization) to prevent over-specialization to specific data modes.

Architecture

Conceptual overview of the FedLECC strategy stages.

Evaluation Highlights

+12% test accuracy improvement on FMNIST under severe label skew compared to FedAvg and strong baselines.
Reduces communication rounds by approximately 22% to reach target accuracy compared to FedAvg.
Reduces overall communication overhead by up to 50% compared to strong baselines like FedCor and POC.

Breakthrough Assessment

7/10

Solid systems contribution for FL. Effectively combines two known heuristics (clustering and loss-based selection) to address the specific problem of label skew, showing significant efficiency gains.

⚙️ Technical Details

Problem Definition

Setting: Federated Learning with K clients, where each client i holds a local dataset D_i and trains a local model parameterized by theta_i to minimize global empirical risk.

Inputs: Client local datasets (disjoint across clients), global model parameters

Outputs: Updated global model theta

Pipeline Flow

Client Profiling (Quantify non-IID)
Server Clustering (Group clients)
Dynamic Selection (Round-based selection)

System Modules

Label Histogram Reporter

Clients send normalized label histograms to the server to characterize their local data distribution

Clustering Engine

Groups clients with similar label distributions using OPTICS clustering on pairwise Hellinger Distances

Selection Manager

Selects J clusters with highest average loss, then selects z high-loss clients per cluster

Novel Architectural Elements

Two-stage selection pipeline that integrates static label-distribution clustering with dynamic loss-based prioritization

Modeling

Base Model: Multilayer Perceptron (MLP) with two hidden layers (200 neurons)

Training Method: Stochastic Gradient Descent (SGD)

Training Data:

MNIST (partitioned via Dirichlet distribution)
FMNIST (partitioned via Dirichlet distribution)

Key Hyperparameters:

learning_rate: 0.005
batch_size: 64
communication_rounds: 150
+ 1 more
alpha_dirichlet: Values inducing HD approx 0.9 (severe skew)

Compute: Not reported in the paper

Comparison to Prior Work

vs. FedProx/FedNova/FedDyn: FedLECC is a selection strategy, not a regularization term; it can be complementary.
vs. POC: POC only uses loss; FedLECC constrains loss-based selection within clusters to ensure diversity.
vs. HACCS: HACCS prioritizes latency/stragglers; FedLECC prioritizes informative (high-loss) updates.
+ 1 more
vs. Oort [not cited in paper]: Oort also combines system speed with utility (loss), but FedLECC focuses specifically on label-distribution clustering rather than system latency.

Limitations

Relies on clients sharing label histograms, which may have privacy implications (though paper suggests Differential Privacy can mitigate this).
Clustering is computed based on initial data; handling dynamic data drift (concept drift) where label distributions change over time is not detailed.
Evaluated only on relatively simple image datasets (MNIST, FMNIST) and MLP models.
Requires an initial communication step to gather loss values from potential candidates or relies on stale loss values.

Reproducibility

Code availability is not provided in the paper text. Datasets (MNIST, FMNIST) are public. Partitioning method (FedArtML) is cited.

📊 Experiments & Results

Evaluation Setup

Federated Learning simulation with K=100 clients under severe label skew (Dirichlet partition).

Benchmarks:

MNIST (Image Classification)
FMNIST (Fashion-MNIST) (Image Classification)

Metrics:

Test Accuracy
Communication Rounds
Total Communication Overhead
Statistical methodology: Results averaged over five random seeds.

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
FedLECC achieves superior test accuracy compared to state-of-the-art baselines under severe label skew settings.
FMNIST	Test Accuracy	73.2	82.0	+8.8
FMNIST	Communication Rounds reduction	150	117	-33
FMNIST	Total Communication Overhead reduction	100	50	-50

Experiment Figures

Test accuracy vs. Communication Rounds for FedLECC and baselines on FMNIST with K=100 clients.

Main Takeaways

FedLECC consistently converges faster and reaches higher accuracy than uniform sampling (FedAvg) and single-factor selection (POC) under severe non-IID data.
The combination of clustering (diversity) and loss-guidance (informativeness) effectively mitigates client drift caused by label skew.
Significant reduction in communication overhead makes it suitable for bandwidth-constrained cloud-edge environments.
Outperforms purely regularization-based methods (FedProx, FedDyn) in the high non-IID regime by ensuring better data representation in each round.

📚 Prerequisite Knowledge

Prerequisites

Federated Learning fundamentals (FedAvg)
Non-IID data challenges (Label Skew)
Clustering algorithms (OPTICS)

Key Terms

FL: Federated Learning—distributed machine learning where models are trained across multiple devices without exchanging raw data

Non-IID: Non-Independent and Identically Distributed—data distributions that vary significantly across different clients

Label Skew: A specific type of non-IID data where clients have disjoint or highly imbalanced distributions of class labels

HD: Hellinger Distance—a metric used to quantify the similarity between two probability distributions

FedAvg: Federated Averaging—the standard algorithm for FL where client updates are averaged to update the global model

Client Drift: The phenomenon where local models trained on heterogeneous data move apart from the global optimum

OPTICS: Ordering Points To Identify the Clustering Structure—a density-based clustering algorithm

Straggler: A client device that takes significantly longer to compute updates, slowing down the overall training round