Personalized and privacy-preserving federated heterogeneous medical image analysis with PPPML-HMI

📝 Paper Summary

Federated Learning in Medical Imaging Privacy-Preserving Machine Learning Personalized Federated Learning

PPPML-HMI combines personalized federated learning with a novel cyclic secure aggregation protocol using homomorphic encryption to enable collaborative medical image analysis across hospitals with heterogeneous data and strict privacy requirements.

Core Problem

Standard federated learning fails on heterogeneous medical data (non-IID) due to model drift and remains vulnerable to gradient leakage attacks that can reconstruct private patient images.

Why it matters:

Hospitals use diverse CT scanners and settings, creating heterogeneous data where a single global model performs poorly locally
Medical data is highly sensitive; sharing raw data is often legally impossible, and standard FL gradients can still leak private information via reconstruction attacks
Current solutions rarely address both personalization (for accuracy) and rigorous cryptographic privacy (for security) simultaneously in an open-source framework

Concrete Example: In the paper's case study, a model trained on Hospital C's data fails completely when applied to Hospital D (Dice score 0.28). Standard FedAvg also underperforms on Hospital D (Dice 0.39) compared to local training (Dice 0.46) due to data heterogeneity.

Key Novelty

Personalized & Privacy-Preserving Federated Learning (PPPML-HMI)

Integrates Per-FedAvg (meta-learning) to train a highly adaptable global model that users fine-tune locally, solving the heterogeneity problem
Replaces the central server's aggregation role with a decentralized Cyclic Secure Aggregation loop where users pass homomorphically encrypted gradients, preventing the server from ever seeing raw updates

Architecture

High-level schematic of PPPML-HMI showing the personalized FL process and the Cyclic Secure Aggregation with Homomorphic Encryption (CSAHE) loop.

Evaluation Highlights

Achieved ~5% higher average Dice score on real-world heterogeneous COVID-19 segmentation compared to conventional FedAvg
Successfully blocked Deep Leakage from Gradients (iDLG) attacks, preventing reconstruction of private CT images while FedAvg leaked them
Outperformed independent local training for hospitals with distinct data distributions (e.g., Hospital D: Dice 0.51 vs 0.46)

Breakthrough Assessment

8/10

Strong practical contribution combining personalization and strong cryptographic privacy for a critical medical task. Demonstrates effectiveness on real-world heterogeneous clinical data, though the algorithmic components (Per-FedAvg, HE) are known individually.

⚙️ Technical Details

Problem Definition

Setting: Federated learning with N users, each having private non-IID dataset D_i. Goal: minimize F(w) = sum(f_i(w - alpha * grad(f_i(w)))) [meta-learning objective] while protecting gradients.

Inputs: 3D CT scans from different hospitals (heterogeneous in scanner type, slice thickness)

Outputs: Personalized segmentation maps (lung infection regions) or classification labels

Pipeline Flow

Server broadcasts global model w_k
Local Training: Users update model via Per-FedAvg (meta-learning step)
CSAHE Loop: Users aggregate gradients in a secure ring topology using Homomorphic Encryption
Server Update: Server receives aggregated encrypted gradient, decrypts (via initiator), and updates global model
Personalization: Final local adaptation steps on user data

System Modules

Local Trainer

Compute local meta-gradients using private data

Model or implementation: 3D DenseNet (Classification) or 2.5D U-Net (Segmentation)

CSAHE Aggregator

Securely sum gradients across users without revealing individual contributions

Model or implementation: CKKS Homomorphic Encryption scheme via TenSEAL library

Novel Architectural Elements

Decentralized 'encryption-summation' loop (CSAHE) integrated directly into the FL round, replacing central server aggregation
Integration of Per-FedAvg meta-learning objective with homomorphic encryption constraints

Modeling

Base Model: 2.5D U-Net (Segmentation) / 3D DenseNet (Classification)

Training Method: Personalized Federated Averaging (Per-FedAvg) with Cyclic Secure Aggregation

Objective Functions:

Purpose: Find a global initialization that adapts fast.

Formally: min_w F(w) = (1/n) * sum(f_i(w - alpha * grad(f_i(w))))
Purpose: Securely aggregate gradients.

Formally: Encrypted_Sum = HE(gradient_initiator + Noise) + HE(gradient_user1) + ... + HE(gradient_userN)

Adaptation: Fine-tuning (local adaptation) for <5 epochs after global training

Trainable Parameters: Full model parameters (U-Net or DenseNet)

Training Data:

RAD-ChestCT Dataset (Classification): 35,747 scans
Private COVID-19 Dataset (Segmentation): 180 scans from 5 hospitals

Key Hyperparameters:

global_epochs: 20
local_epochs: 10
batch_size: 64
+ 2 more
learning_rate: 1e-4
random_noise_std_dev: >100 (for CSAHE masking)

Compute: Training time: 149.55 hours (PPPML-HMI) vs 110.37 hours (FedAvg) on 5 users. Hardware: NVIDIA V100 GPU, 120GB RAM.

Comparison to Prior Work

vs. FedAvg: Adds meta-learning personalization and homomorphic encryption privacy
vs. FedReplay: Does not require auxiliary networks or structural modifications to the model
vs. NVFlare/FATE: Existing frameworks focus on standard FL; PPPML-HMI specifically targets personalization + HE privacy for medical imaging

Limitations

Requires at least 3 clients to be secure (vulnerable to initiator attack if N=2)
Higher computational and time cost (~35% slower than FedAvg) due to encryption overhead
Tested in simulated lab environment; does not account for real-world network latency or dropped connections

Reproducibility

Code: https://github.com/JoshuaChou2018/PPPML-HMI

Code is publicly available at https://github.com/JoshuaChou2018/PPPML-HMI. RAD-ChestCT dataset is public. The COVID-19 segmentation dataset from partner hospitals is available upon request.

📊 Experiments & Results

Evaluation Setup

Federated training across simulated hospital silos (RAD-ChestCT) and real-world heterogeneous hospital data (COVID-19)

Benchmarks:

RAD-ChestCT Classification (Binary classification (Healthy vs. Patient))
COVID-19 Infection Segmentation (Medical Image Segmentation) [New]

Metrics:

Dice score
Accuracy
Recall
Statistical methodology: Five-fold cross-validation

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Segmentation results on real-world COVID-19 data (5 hospitals) show PPPML-HMI recovering performance lost by standard FL due to heterogeneity.
COVID-19 Segmentation (Hospital A)	Dice score	0.51	0.62	+0.11
COVID-19 Segmentation (Hospital D)	Dice score	0.39	0.51	+0.12
COVID-19 Segmentation (Hospital E)	Dice score	0.63	0.68	+0.05
RAD-ChestCT Classification (Split 2, User A)	Accuracy	0.68	0.94	+0.26

Experiment Figures

A) Qualitative segmentation results. B) Gradient inversion attack (iDLG) reconstruction results.

Main Takeaways

PPPML-HMI significantly outperforms standard FedAvg in heterogeneous settings, often matching the performance of centralized training (oracle)
Standard FL can sometimes perform worse than local independent training when data is highly heterogeneous (e.g., Hospital D), whereas PPPML-HMI consistently improves over local baselines
The method is robust to varying numbers of users and sample sizes, as shown in the RAD-ChestCT splits
Privacy experiments confirm that while gradients in standard FL leak reconstructible images, the encrypted noise-masked gradients in PPPML-HMI produce only noise to attackers

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FedAvg algorithm)
Homomorphic Encryption (CKKS scheme)
Meta-Learning (MAML concepts)
Medical Image Segmentation (U-Net architectures)

Key Terms

FL: Federated Learning—a technique to train models across decentralized devices holding local data samples, without exchanging them

PFL: Personalized Federated Learning—variants of FL designed to handle non-IID data by adapting the global model to individual users

HE: Homomorphic Encryption—encryption that allows computations (like addition) to be performed on ciphertext, yielding an encrypted result that decrypts to the correct operation on the plaintext

CSAHE: Cyclic Secure Aggregation with Homomorphic Encryption—the paper's novel protocol where users pass encrypted gradients in a ring topology to aggregate them without exposing individual updates

iDLG: Improved Deep Leakage from Gradients—an attack method that reconstructs private training data (like images) by analyzing the gradients shared during training

Per-FedAvg: Personalized Federated Averaging—an FL algorithm inspired by meta-learning (MAML) where the goal is to find a good initialization that adapts quickly to local tasks

Dice score: A metric for evaluating image segmentation accuracy, measuring the overlap between the predicted segmentation and the ground truth

non-IID: Non-Independent and Identically Distributed—data that does not follow the same distribution across all users (e.g., different scanner artifacts)

HBC attacker: Honest-But-Curious attacker—a participant who follows the protocol correctly but tries to infer private information from the legitimate messages they receive