Adaptive Model Pruning and Personalization for Federated Learning Over Wireless Networks

📝 Paper Summary

Federated Learning (FL) Resource-constrained Edge AI Personalization

A wireless Federated Learning framework that reduces latency by pruning the shared global model while maintaining local personalized parameters to handle data heterogeneity.

Core Problem

Standard Federated Learning suffers from high communication/computation latency on resource-constrained devices and poor accuracy on non-IID data due to heterogeneity.

Why it matters:

Edge devices often have limited bandwidth and battery, making the transmission of large full models prohibitive.
Data heterogeneity (non-IID data) causes global models to drift, resulting in poor generalization and low accuracy for individual users.
Existing pruning methods often fail to account for the dynamic wireless channel conditions and data heterogeneity simultaneously.

Concrete Example: A mobile device with poor channel conditions trying to upload a full CNN model for image classification experiences high latency, slowing down the entire synchronized FL round. Additionally, a global model trained on diverse data may fail to recognize specific local patterns (e.g., specific handwriting styles in MNIST).

Key Novelty

Joint Adaptive Pruning and Personalization (JAPP-FL)

Splits the model into a personalized part (kept local) and a global part (shared but pruned), allowing devices to retain specific features while learning general representations.
Uses KKT conditions to mathematically derive closed-form solutions for the optimal pruning ratio and bandwidth allocation, balancing latency constraints against learning accuracy.

Architecture

The FL framework showing the split between Personalized Part (kept local) and Global Part (pruned and shared).

Evaluation Highlights

Reduces computation and communication latency by approximately 50% compared to FL with only partial model personalization.
Maintains comparable testing accuracy to unpruned personalized FL baselines on non-IID Fashion MNIST.
Achieves faster convergence in terms of global rounds compared to equal resource pruning schemes.

Breakthrough Assessment

7/10

Strong theoretical contribution with closed-form resource allocation and significant latency reduction. However, validation is limited to basic datasets (MNIST/Fashion MNIST) and standard CNNs.

⚙️ Technical Details

Problem Definition

Setting: Federated Learning over a wireless network with K edge devices, each having non-IID dataset D_k.

Inputs: Local datasets on edge devices, global model parameters broadcast by server.

Outputs: Optimized global model u and personalized local models v_k.

Pipeline Flow

Global Part Broadcasting (Server → Devices)
Local Personalization Update (Device)
Global Part Pruning & Update (Device)
Uplink Transmission of Pruned Global Part (Device → Server)
Global Part Aggregation (Server)

System Modules

Personalized Part Updater

Updates the device-specific model layers (e.g., convolutional layers) using local data.

Model or implementation: CNN Convolutional Layers

Adaptive Pruner

Calculates weight importance and applies a binary mask to the global model part to reduce size based on channel conditions.

Model or implementation: CNN Fully Connected Layers (Global Part)

Bandwidth Allocator

Determines the optimal bandwidth fraction b_k for each device to meet latency constraints.

Model or implementation: Closed-form KKT solution

Novel Architectural Elements

Dynamic split of model into personalized (local-only) and global (shared-pruned) components where the pruning ratio is derived directly from wireless channel latency constraints.
Integration of KKT-based optimization directly into the FL loop to adjust model size per-round per-device.

Modeling

Base Model: Convolutional Neural Network (CNN)

Training Method: Stochastic Gradient Descent (SGD) with alternating updates (LocalAlt)

Objective Functions:

Purpose: Minimize global loss under latency and bandwidth constraints.

Formally: min F(u, v_k) s.t. T_g ≤ T_th and Σ b_k ≤ 1.

Trainable Parameters: Split into personalized (CNN layers) and global (Fully Connected layers).

Training Data:

MNIST (non-IID)
Fashion MNIST (non-IID)

Key Hyperparameters:

learning_rate: 0.001
batch_size: 128
quantization_bit: 32
+ 1 more
latency_threshold: 25ms

Compute: Edge Server (GPU), 10 Mobile Devices (CPU frequency 3GHz). 20MHz bandwidth.

Comparison to Prior Work

vs. FedAlt [11]: Adds adaptive pruning to reduce communication cost by ~50% while maintaining accuracy.
vs. PruneFL [14]: Explicitly handles non-IID data via the personalized model split, whereas PruneFL focuses on resource-limited devices.
vs. FL with Pruning [15]: Optimizes pruning ratio dynamically based on wireless channel state and latency constraints using KKT, rather than fixed or heuristic pruning.

Limitations

Evaluation limited to simple CNNs on MNIST/Fashion MNIST; no large-scale Transformer or NLP tasks.
Assumes synchronous FL, where the slowest device bottlenecks the round (handled via pruning, but still a constraint).
Requires knowing channel state information (CSI) perfectly for the optimization step.
Pruning is unstructured, which may not translate to actual speedup on all hardware types.

Reproducibility

No code repository provided. Simulation parameters (channel power, bandwidth, CPU frequency) are listed in Section V. Dataset splits (non-IID construction) are described generally but exact seed/distribution details are sparse.

📊 Experiments & Results

Evaluation Setup

Wireless FL simulation with 1 edge server and 10 devices.

Benchmarks:

MNIST (Image Classification (non-IID))
Fashion MNIST (Image Classification (non-IID))

Metrics:

Testing Accuracy
Global Loss
Computation Latency
Communication Latency
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparison of latency and accuracy against baselines on Fashion MNIST shows efficiency gains.
Fashion MNIST	Latency per round (ms)	55	25	-30
Fashion MNIST	Testing Accuracy	0.82	0.88	+0.06
Fashion MNIST	Communication Cost (model weights)	2.0e7	1.0e7	-1.0e7

Experiment Figures

Loss, Accuracy, and Communication Cost comparison against baselines.

Effect of Latency Thresholds (15ms to 30ms) on accuracy and pruning ratio.

Main Takeaways

Joint optimization of pruning and bandwidth allows devices with poor channels to prune more aggressively, satisfying latency constraints without stalling the network.
Partial personalization (keeping feature extraction local) effectively handles non-IID data, outperforming pure pruning methods on heterogeneous distributions.
The proposed method converges to similar accuracy as full-model methods but with significantly lower communication overhead and latency.

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FedAvg)
Convex Optimization (KKT conditions)
Wireless Communication models (OFDMA, Shannon capacity)

Key Terms

non-IID: Non-Independent and Identically Distributed data—data distributions vary significantly across different devices (e.g., different classes of images).

KKT conditions: Karush-Kuhn-Tucker conditions—first-order tests for a solution in nonlinear programming to be optimal, used here to solve for bandwidth and pruning ratios.

OFDMA: Orthogonal Frequency-Division Multiple Access—a digital modulation scheme that assigns subsets of subcarriers to individual users.

LocalAlt: An update strategy where personalized and global parameters are updated alternately rather than simultaneously.

Unstructured pruning: Removing individual weights based on importance (magnitude or gradient impact) rather than removing whole structures like filters or layers.

FL: Federated Learning—training a machine learning model across multiple decentralized edge devices holding local data samples.