Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation

📝 Paper Summary

Personalized Federated Learning Parameter-Efficient Fine-Tuning (PEFT)

pFedPG enables efficient personalization of large frozen models in Federated Learning by learning a server-side generator that produces client-specific visual prompts based on local optimization feedback.

Core Problem

Standard Federated Learning struggles with data heterogeneity across clients, and adapting large foundation models (like ViT) is computationally expensive and bandwidth-heavy.

Why it matters:

Directly averaging prompts from heterogeneous clients (as in FedVPT) leads to suboptimal performance because a single set of prompts cannot capture diverse distributions
Fine-tuning entire large-scale models on edge devices is often infeasible due to limited compute and communication constraints
Existing personalization methods like Hypernetworks are typically restricted to small architectures and fail to scale to modern foundation models

Concrete Example: In a DomainNet setting where one client has 'Real' images and another has 'Sketch' images, averaging their learned prompts results in a generic prompt that fits neither domain well. pFedPG generates distinct prompts for the 'Sketch' client versus the 'Real' client.

Key Novelty

Client-Specific Prompt Generation (pFedPG)

Instead of aggregating model weights, the server learns a 'Prompt Generator' that produces unique visual prompts for each client using a learned client descriptor
The server optimizes this generator by treating the difference between updated local prompts and initial prompts as a gradient signal, learning to predict the optimal initialization for each client's specific data distribution

Architecture

Overview of the pFedPG framework, illustrating the interaction between the Server (Prompt Generation) and Clients (Prompt Adaptation)

Evaluation Highlights

+15.47% accuracy improvement on CIFAR-100 (Disjoint label space) compared to FedVPT (Federated Visual Prompt Tuning)
+7.48% accuracy improvement on DomainNet (domain heterogeneity) compared to FedVPT
Reduces communication cost by ~99.99% compared to full model fine-tuning methods (e.g., FedAvg, FedProx) by transmitting only prompts

Breakthrough Assessment

7/10

Significant performance gains on heterogeneous data with high parameter efficiency. Cleverly adapts Hypernetwork concepts to Prompt Tuning, solving the scalability issue of previous personalized FL methods.

⚙️ Technical Details

Problem Definition

Setting: Personalized Federated Learning with N clients having heterogeneous datasets D_n, using a pre-trained frozen foundation model F*

Inputs: Client-specific data D_n = {(x_i, y_i)}

Outputs: Personalized visual prompts P_n and classification heads H_n for each client

Pipeline Flow

Server generates personalized prompts using Generator G
Client receives prompts, freezes backbone, and updates prompts/head on local data
Client sends prompt update direction (delta) back to Server
Server updates Generator G to minimize difference between generated and locally-optimized prompts

System Modules

Prompt Generator (G)

Generate client-specific prompts from basis vectors and client descriptors

Model or implementation: Cross-Attention Network

Local Learner

Adapt generated prompts to local private data

Model or implementation: Frozen ViT-B/16 with learnable prompts P_n and head H_n

Novel Architectural Elements

Optimization of a server-side generator using aggregated local update directions (Delta P) rather than averaging weights
Cross-attention based generator architecture mapping client descriptors to prompt basis

Modeling

Base Model: ViT-B/16 (ImageNet-21k pre-trained)

Training Method: Personalized Federated Learning with alternating optimization

Objective Functions:

Purpose: Optimize local prompts for classification.

Formally: Cross-entropy loss L_n(H_n(F*([c, P_n, z])), y)
Purpose: Optimize server generator to match local update directions.

Formally: Update phi using gradients approximated by (nabla_phi P_n)^T * Delta P_n

Adaptation: Visual Prompt Tuning (VPT)

Trainable Parameters: Prompts P_n (K=10 vectors), Classification Head H_n, Server Generator G, Client Descriptors D

Key Hyperparameters:

local_learning_rate_gamma: 0.25
server_learning_rate_alpha: 0.001
batch_size: 64
+ 4 more
local_epochs: 5
communication_rounds: 100
num_prompts_K: 10 (3 for Dermoscopic-FL)
optimizer: SGD

Compute: NVIDIA TESLA V100 GPU (32GB)

Comparison to Prior Work

vs. FedVPT: Generates personalized prompts per client instead of enforcing a single global prompt set
vs. pFedHN: Generates only small prompt vectors instead of full model weights, enabling use with large ViT backbones
vs. FedRoD: Updates backbone prompts efficiently rather than requiring separate personalized classification heads or feature extractors [not cited in paper context of prompt learning]

Limitations

Relies on the frozen backbone being sufficiently powerful; if domain shift is too extreme for the backbone, prompts alone may be insufficient
Introduces additional server-side parameters (Generator, Descriptors) compared to simple averaging, though communication cost remains low
Requires stable estimation of local gradients (Delta P_n) which might be noisy with very small local datasets

Reproducibility

No replication artifacts mentioned in the paper. Code URL is not provided. Datasets (Office-Caltech10, DomainNet, CIFAR) are public benchmarks.

📊 Experiments & Results

Evaluation Setup

Image classification under Federated Learning with non-IID data

Benchmarks:

Office-Caltech10 (Domain Adaptation)
DomainNet (Domain Adaptation)
CIFAR-100 (Imbalanced Class Distribution)
Dermoscopic-FL (Medical Image Diagnosis)

Metrics:

Top-1 Accuracy (%)
Communication Cost
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Performance on Domain Adaptation benchmarks showing robustness to domain shift.
DomainNet	Top-1 Accuracy	64.16	71.64	+7.48
Office-Caltech10	Top-1 Accuracy	94.29	96.81	+2.52
Performance on Class Imbalance/Distribution Skew benchmarks.
CIFAR-100 (Disjoint)	Top-1 Accuracy	55.49	70.96	+15.47
CIFAR-100 (Dirichlet 0.1)	Top-1 Accuracy	45.26	55.91	+10.65

Main Takeaways

pFedPG consistently outperforms FedVPT and other personalized FL baselines across both domain shift and label skew scenarios
Using a learned prompt generator is significantly better than averaging prompts (FedVPT) or using a single global prompt basis
The method is extremely communication efficient, transmitting only prompt vectors (~0.01% of model size) compared to full parameter transmission
Ablation studies show that the cross-attention generator architecture outperforms MLP or AdaIN based generators

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FL) basics (FedAvg)
Vision Transformers (ViT)
Visual Prompt Tuning (VPT)
Hypernetworks

Key Terms

Federated Learning: A decentralized learning framework where clients train models locally and a server aggregates updates without accessing private data

Visual Prompt Tuning: A technique to adapt frozen vision models by inserting learnable parameters (prompts) into the input space

ViT: Vision Transformer—a model architecture that processes images as sequences of patches using self-attention

Non-IID: Non-Independent and Identically Distributed; refers to data heterogeneity where clients have different label distributions or domain shifts

Hypernetwork: A neural network that generates the weights for another neural network

FedVPT: Federated Visual Prompt Tuning—a baseline method that applies standard FedAvg aggregation to visual prompts

Client Descriptor: A learnable vector maintained by the server that encodes specific characteristics of a client to condition the prompt generation

Prompt Basis: A set of client-agnostic prompt embeddings stored at the server, used as the source material for generating personalized prompts