Adaptive client selection with personalization for communication efficient Federated Learning

📝 Paper Summary

Federated Learning Edge Computing Efficient Communication

ACSP-FL reduces federated learning communication costs by dynamically selecting fewer clients based on performance and sharing only partial global model layers while keeping local personalized layers.

Core Problem

Standard Federated Learning (FL) suffers from high communication overhead and slow convergence, especially when client data is non-IID and heterogeneous.

Why it matters:

Communication bottlenecks prevent FL scalability on edge devices with limited bandwidth
Fixed client selection strategies (like selecting k random clients) are inefficient, often selecting poor-performing or redundant clients
Full model transmission is costly; transmitting unnecessary parameters wastes energy and network resources

Concrete Example: In Human Activity Recognition, a client with only 'walking' data (non-IID) might receive a global model trained mostly on 'sitting' data. Standard FL forces this client to download the full global model and upload full updates, wasting bandwidth on parameters irrelevant to its local distribution, while still failing to recognize 'walking' accurately due to lack of personalization.

Key Novelty

Adaptive Client Selection with Personalization (ACSP-FL)

Filters clients dynamically: only selects clients whose current accuracy is below the global average, prioritizing those who need training the most
Applies a decay function to gradually reduce the total number of participating clients as the global model converges
Splits the model into shared global layers (collaboratively trained) and private local layers (personalized), transmitting only the shared portion to reduce payload

Architecture

The ACSP-FL workflow: (a) Personalization Phase (clients combine global/local models), (b) Evaluation Phase (clients test and report accuracy), (c) Selection Phase (Server filters clients < mean accuracy).

Evaluation Highlights

Reduces communication overhead by up to 95% compared to FedAvg while maintaining comparable accuracy
Reduces convergence time significantly; on UCI-HAR, ACSP-FL finishes 100 rounds in ~500s vs ~1500s for FedAvg
Achieves superior efficiency scores (weighted accuracy + overhead reduction) across IID and non-IID datasets compared to POC and DEEV

Breakthrough Assessment

7/10

Solid engineering combination of adaptive selection and partial model sharing showing very strong efficiency gains (90-95% comms reduction). While the individual components are known, the specific integration and rigorous container-based evaluation are valuable contributions.

⚙️ Technical Details

Problem Definition

Setting: Federated Learning over N heterogeneous clients with non-IID local datasets, aiming to minimize global loss while reducing communication bits

Inputs: Local datasets D_i on client devices (accelerometer/gyroscope time-series)

Outputs: Global shared model parameters w_g and local personalized parameters w_l for each client

Pipeline Flow

Server: Distribute Partial Model
Client: Personalize & Train
Client: Evaluate & Report
Server: Adaptive Selection

System Modules

Model Splitter (Server-side)

Defines which layers are global (shared) and which are local (private)

Model or implementation: Function K(w, L)

Personalization Combiner

Combines received global layers with stored local layers to form a full trainable model

Model or implementation: Concatenation w_i = [w_g, w_l_i]

Adaptive Selector (Server-side)

Selects subset of clients for next round based on performance

Model or implementation: Filtering function π(·) + Decay function

Novel Architectural Elements

Integration of performance-based filtering (select under-performers) with exponential decay of client participation count
Hybrid model architecture explicitly splitting MLP into shared feature extractor and local personalized classifier layers within the FL loop

Modeling

Base Model: Multilayer Perceptron (MLP) with 3 hidden layers (256 units each)

Training Method: Federated Learning with Partial Model Sharing (custom loop)

Objective Functions:

Purpose: Minimize classification loss on local data.

Formally: min L(w_i) where w_i is composed of global and local parameters.

Adaptation: Personalization via local layers (w_l) that are never shared

Trainable Parameters: Full MLP trained locally; only subset w_g aggregated globally

Training Data:

UCI-HAR (30 users)
MotionSense (24 users)
ExtraSensory (60 users)

Key Hyperparameters:

hidden_layers: 3
units_per_layer: 256
local_epochs: Not explicitly reported in the paper
+ 2 more
loss_function: Sparse Categorical Crossentropy
optimizer: SGD

Compute: Simulated on Docker Swarm with VM managers and workers; exact GPU/CPU specs not reported

Comparison to Prior Work

vs. POC/Oort: ACSP-FL adapts the *number* of clients (decay) rather than just *which* clients, preventing over-selection in later rounds
vs. DEEV: ACSP-FL adds partial model sharing (personalization) to further reduce payload [cited in paper]
vs. FedAvg: Deterministic performance-based selection instead of random sampling

Limitations

Decay function parameters (alpha) need manual tuning and sensitivity analysis is not deeply explored
Partial model sharing might limit the global model's ability to learn complex features if the shared portion is too small
Evaluation limited to simple MLP models on sensor data; applicability to large LLMs or CNNs not tested

Reproducibility

Code: https://github.com/AllanMSouza/ACSP-FL

Code is publicly available at https://github.com/AllanMSouza/ACSP-FL. The paper uses standard datasets (UCI-HAR, MotionSense, ExtraSensory). Docker environment description is detailed, enabling replication of the system setup.

📊 Experiments & Results

Evaluation Setup

Human Activity Recognition (HAR) on heterogeneous mobile data

Benchmarks:

UCI-HAR (Activity Classification (IID))
MotionSense (Activity Classification (IID))
ExtraSensory (Activity Classification (Non-IID))

Metrics:

Distributed accuracy
Communication Overhead (latency reduction vs FedAvg)
Convergence time (s)
Data transmitted (TX bytes)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
UCI-HAR	Communication Overhead Reduction	0	95	95
MotionSense	Communication Overhead Reduction	0	90	90
UCI-HAR	Convergence Time (100 rounds)	1500	500	-1000

Experiment Figures

Decay function visualization: Number of clients selected vs. Communication Rounds for different decay parameter (alpha) values.

Model Splitting Strategy: Neural network diagram showing shared layers vs. personalized layers.

Main Takeaways

ACSP-FL significantly lowers communication costs (up to 95% reduction) by transmitting fewer parameters and selecting fewer clients.
The adaptive selection strategy prevents 'over-training' by gradually removing clients that have already converged effectively.
Personalization (partial sharing) allows the system to maintain high accuracy even on non-IID data (ExtraSensory) where standard FedAvg might struggle.
Container-based evaluation confirms real-world benefits in latency and resource usage, not just theoretical round reduction.

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FedAvg algorithm)
Stochastic Gradient Descent (SGD)
Basic Neural Networks (MLP)

Key Terms

FedAvg: Federated Averaging—the standard algorithm for FL where clients train locally and a server averages their weights

Non-IID: Non-Independent and Identically Distributed—data distribution varies across clients (e.g., one user only walks, another only sits)

Client Drift: The phenomenon where local models move away from the global optimal solution due to non-IID data

Epoch: One complete pass through the training dataset

MLP: Multilayer Perceptron—a basic feedforward artificial neural network

gRPC: A high-performance open-source universal RPC framework used for client-server communication

Docker Swarm: A container orchestration tool used here to simulate distributed clients on distinct virtual machines