Federated Active Learning Under Extreme Non-IID and Global Class Imbalance

📝 Paper Summary

Federated Active Learning (FAL) Non-IID Data Learning

FairFAL improves federated active learning by adaptively selecting between global and local query models and employing prototype-guided sampling to ensure class balance even under severe distribution skew.

Core Problem

Existing Federated Active Learning methods fail when global data is severely long-tailed and clients are highly heterogeneous, as they bias sampling toward majority classes and rely on rigid, suboptimal query models.

Why it matters:

Real-world federated systems (e.g., mobile devices, hospitals) often contain rare but critical classes that appear sparsely, creating long-tailed global distributions
Current strategies treat heterogeneity merely as a partitioning issue, ignoring the global class imbalance that degrades performance on minority categories
Annotation budgets in decentralized systems are severely constrained, making inefficient or biased sampling a critical failure point

Concrete Example: In a federated system with severe global imbalance (many 'dog' images, few 'platypus' images), a standard local model on a client with only 'dogs' will confidently query more 'dogs', exacerbating the global imbalance. FairFAL detects this skew and switches strategies to prioritize the rare 'platypus' class.

Key Novelty

Adaptive Class-Fair Federated Active Learning (FairFAL)

Dynamically selects either the global or local model for querying based on real-time estimates of global imbalance and local-global distribution divergence
Uses global feature prototypes to assign pseudo-labels to unlabeled data, guiding the sampling process to specifically target minority classes
Refines queries using a two-stage strategy: first selecting high-uncertainty candidates per class, then enforcing diversity via gradient-embedding clustering

Breakthrough Assessment

7/10

Addresses a realistic and under-explored intersection of non-IID and long-tailed data in FAL. The adaptive model selection is a logical and empirically grounded contribution.

⚙️ Technical Details

Problem Definition

Setting: Federated Active Learning with K clients, where global data is long-tailed (imbalance ratio ρ) and local data is non-IID (Dirichlet parameter α).

Inputs: Distributed local datasets with small labeled sets D_L and large unlabeled pools D_U.

Outputs: A trained global model parameters θ_g optimized over the aggregated labeled data.

Pipeline Flow

Global Imbalance Estimation (One-time)
Per-Round: Federated Training
Per-Round: Adaptive Model Selection
Per-Round: Prototype-Guided Querying (Candidate Selection -> Diversity Refinement)

System Modules

Stat Estimator (Adaptive Selection)

Estimate global imbalance ratio (γ) and local-global divergence (d_k)

Model or implementation: Statistical aggregation

Model Selector (Adaptive Selection)

Decide whether to use the global model or local model for querying

Model or implementation: Decision Rule

Prototype Generator (Querying)

Compute class prototypes to guide sampling

Model or implementation: Global Model Feature Extractor

Pseudo-Labeler (Querying)

Assign pseudo-labels to unlabeled pool to enable class-aware sampling

Model or implementation: Nearest Prototype Classifier

Two-Stage Sampler (Querying)

Select final query set balancing uncertainty and diversity

Model or implementation: Heuristic + k-center

Novel Architectural Elements

Adaptive switching mechanism that selects between local and global query models per-client based on distribution statistics
Two-stage sampling pipeline combining uncertainty screening with gradient-embedding diversity in a class-aware manner

Modeling

Base Model: Not restricted (experiments use CIFAR-10 compatible architectures, likely ResNet/CNN)

Training Method: Federated Averaging (FedAvg) combined with Active Learning rounds

Key Hyperparameters:

query_budget_per_round: 5% of training data
model_selection_threshold_delta: 0.75
dirichlet_alpha: {0.1, 100}
+ 1 more
imbalance_ratio_rho: {1, 20}

Compute: Not reported in the paper

Comparison to Prior Work

vs. LoGo: FairFAL adaptively selects the query model instead of a fixed hybrid approach
vs. KAFAL/IFAL: FairFAL explicitly addresses global long-tail imbalance via prototype-guided class-fair sampling, whereas others focus primarily on heterogeneity

Limitations

Relies on the availability of a labeled initialization set to estimate distribution statistics
Gradient embedding computation for diversity sampling adds computational overhead on clients
Threshold δ is fixed at 0.75; sensitivity analysis not detailed in extracted text

Reproducibility

Code: https://github.com/chenchenzong/FairFAL

Code is publicly available at https://github.com/chenchenzong/FairFAL. Detailed experimental settings (CIFAR-10, alpha/rho values) are provided in the Observation section.

📊 Experiments & Results

Evaluation Setup

Federated Active Learning on image classification tasks under varying degrees of non-IID and global imbalance.

Benchmarks:

CIFAR-10 (Image Classification)
Four other benchmarks (Image Classification)

Metrics:

Mean Test Accuracy
Area Under Learning Curve (AULC)
Statistical methodology: Paired analysis using Positive Ratio, One-sided Wilcoxon p-value, and Hodges-Lehmann estimator over 5 random seeds.

Main Takeaways

The model that achieves more class-balanced sampling (especially for minority classes) consistently leads to better final performance.
Global model querying is beneficial only when the global distribution is highly imbalanced and client data are relatively homogeneous.
Local model querying is preferable when the global distribution is balanced or clients are highly heterogeneous (non-IID).
Global model consistently outperforms local model for diversity-based sampling (e.g., Coreset) due to better feature representations.

📚 Prerequisite Knowledge

Prerequisites

Federated Learning (FedAvg)
Active Learning (Uncertainty Sampling, Coreset)
Statistical heterogeneity in distributed data

Key Terms

FAL: Federated Active Learning—training models across decentralized clients while selectively querying only the most informative samples for labeling

Non-IID: Non-Independent and Identically Distributed—data distributions differ across clients (e.g., one hospital has only flu cases, another only fractures)

Global Class Imbalance: The aggregate distribution of classes across all clients is skewed (long-tailed), with some classes being much rarer than others

Dirichlet partition: A method to simulate non-IID data partitions, where a concentration parameter α controls the degree of heterogeneity (lower α = more heterogeneous)

k-center: A diversity-based sampling algorithm that selects a set of points to minimize the maximum distance between any data point and its nearest selected point