A. M. Souza, Filipe Maciel, Joahannes Costa, L. Bittencourt, E. Cerqueira, A. A. Loureiro, Leandro A. Villas
Universidade Estadual de Campinas,
Federal University of Pará,
Federal University of Minas Gerais,
Federal University of Ceará
Ad hoc networks
(2024)
P13NBenchmark
📝 Paper Summary
Federated LearningEdge ComputingEfficient Communication
ACSP-FL reduces federated learning communication costs by dynamically selecting fewer clients based on performance and sharing only partial global model layers while keeping local personalized layers.
Core Problem
Standard Federated Learning (FL) suffers from high communication overhead and slow convergence, especially when client data is non-IID and heterogeneous.
Why it matters:
Communication bottlenecks prevent FL scalability on edge devices with limited bandwidth
Fixed client selection strategies (like selecting k random clients) are inefficient, often selecting poor-performing or redundant clients
Full model transmission is costly; transmitting unnecessary parameters wastes energy and network resources
Concrete Example:In Human Activity Recognition, a client with only 'walking' data (non-IID) might receive a global model trained mostly on 'sitting' data. Standard FL forces this client to download the full global model and upload full updates, wasting bandwidth on parameters irrelevant to its local distribution, while still failing to recognize 'walking' accurately due to lack of personalization.
Key Novelty
Adaptive Client Selection with Personalization (ACSP-FL)
Filters clients dynamically: only selects clients whose current accuracy is below the global average, prioritizing those who need training the most
Applies a decay function to gradually reduce the total number of participating clients as the global model converges
Splits the model into shared global layers (collaboratively trained) and private local layers (personalized), transmitting only the shared portion to reduce payload
Architecture
The ACSP-FL workflow: (a) Personalization Phase (clients combine global/local models), (b) Evaluation Phase (clients test and report accuracy), (c) Selection Phase (Server filters clients < mean accuracy).
Evaluation Highlights
Reduces communication overhead by up to 95% compared to FedAvg while maintaining comparable accuracy
Reduces convergence time significantly; on UCI-HAR, ACSP-FL finishes 100 rounds in ~500s vs ~1500s for FedAvg
Achieves superior efficiency scores (weighted accuracy + overhead reduction) across IID and non-IID datasets compared to POC and DEEV
Breakthrough Assessment
7/10
Solid engineering combination of adaptive selection and partial model sharing showing very strong efficiency gains (90-95% comms reduction). While the individual components are known, the specific integration and rigorous container-based evaluation are valuable contributions.
⚙️ Technical Details
Problem Definition
Setting: Federated Learning over N heterogeneous clients with non-IID local datasets, aiming to minimize global loss while reducing communication bits
Inputs: Local datasets D_i on client devices (accelerometer/gyroscope time-series)
Outputs: Global shared model parameters w_g and local personalized parameters w_l for each client
Pipeline Flow
Server: Distribute Partial Model
Client: Personalize & Train
Client: Evaluate & Report
Server: Adaptive Selection
System Modules
Model Splitter (Server-side)
Defines which layers are global (shared) and which are local (private)
Model or implementation: Function K(w, L)
Personalization Combiner
Combines received global layers with stored local layers to form a full trainable model
Model or implementation: Concatenation w_i = [w_g, w_l_i]
Adaptive Selector (Server-side)
Selects subset of clients for next round based on performance
Model or implementation: Filtering function π(·) + Decay function
Novel Architectural Elements
Integration of performance-based filtering (select under-performers) with exponential decay of client participation count
Hybrid model architecture explicitly splitting MLP into shared feature extractor and local personalized classifier layers within the FL loop
Modeling
Base Model: Multilayer Perceptron (MLP) with 3 hidden layers (256 units each)
Training Method: Federated Learning with Partial Model Sharing (custom loop)
Objective Functions:
Purpose: Minimize classification loss on local data.
Formally: min L(w_i) where w_i is composed of global and local parameters.
Adaptation: Personalization via local layers (w_l) that are never shared
Trainable Parameters: Full MLP trained locally; only subset w_g aggregated globally
Training Data:
UCI-HAR (30 users)
MotionSense (24 users)
ExtraSensory (60 users)
Key Hyperparameters:
hidden_layers: 3
units_per_layer: 256
local_epochs: Not explicitly reported in the paper
Code is publicly available at https://github.com/AllanMSouza/ACSP-FL. The paper uses standard datasets (UCI-HAR, MotionSense, ExtraSensory). Docker environment description is detailed, enabling replication of the system setup.
📊 Experiments & Results
Evaluation Setup
Human Activity Recognition (HAR) on heterogeneous mobile data
Benchmarks:
UCI-HAR (Activity Classification (IID))
MotionSense (Activity Classification (IID))
ExtraSensory (Activity Classification (Non-IID))
Metrics:
Distributed accuracy
Communication Overhead (latency reduction vs FedAvg)
Convergence time (s)
Data transmitted (TX bytes)
Statistical methodology: Not explicitly reported in the paper
Key Results
Benchmark
Metric
Baseline
This Paper
Δ
UCI-HAR
Communication Overhead Reduction
0
95
95
MotionSense
Communication Overhead Reduction
0
90
90
UCI-HAR
Convergence Time (100 rounds)
1500
500
-1000
Experiment Figures
Decay function visualization: Number of clients selected vs. Communication Rounds for different decay parameter (alpha) values.
Model Splitting Strategy: Neural network diagram showing shared layers vs. personalized layers.
Main Takeaways
ACSP-FL significantly lowers communication costs (up to 95% reduction) by transmitting fewer parameters and selecting fewer clients.
The adaptive selection strategy prevents 'over-training' by gradually removing clients that have already converged effectively.
Personalization (partial sharing) allows the system to maintain high accuracy even on non-IID data (ExtraSensory) where standard FedAvg might struggle.
Container-based evaluation confirms real-world benefits in latency and resource usage, not just theoretical round reduction.
📚 Prerequisite Knowledge
Prerequisites
Federated Learning (FedAvg algorithm)
Stochastic Gradient Descent (SGD)
Basic Neural Networks (MLP)
Key Terms
FedAvg: Federated Averaging—the standard algorithm for FL where clients train locally and a server averages their weights
Non-IID: Non-Independent and Identically Distributed—data distribution varies across clients (e.g., one user only walks, another only sits)
Client Drift: The phenomenon where local models move away from the global optimal solution due to non-IID data
Epoch: One complete pass through the training dataset