← Back to Paper List

Federated Active Learning Under Extreme Non-IID and Global Class Imbalance

Chen-Chen Zong, Sheng-Jun Huang
Nanjing University of Aeronautics and Astronautics, China
arXiv (2026)
P13N Benchmark

📝 Paper Summary

Federated Active Learning (FAL) Non-IID Data Learning
FairFAL improves federated active learning by adaptively selecting between global and local query models and employing prototype-guided sampling to ensure class balance even under severe distribution skew.
Core Problem
Existing Federated Active Learning methods fail when global data is severely long-tailed and clients are highly heterogeneous, as they bias sampling toward majority classes and rely on rigid, suboptimal query models.
Why it matters:
  • Real-world federated systems (e.g., mobile devices, hospitals) often contain rare but critical classes that appear sparsely, creating long-tailed global distributions
  • Current strategies treat heterogeneity merely as a partitioning issue, ignoring the global class imbalance that degrades performance on minority categories
  • Annotation budgets in decentralized systems are severely constrained, making inefficient or biased sampling a critical failure point
Concrete Example: In a federated system with severe global imbalance (many 'dog' images, few 'platypus' images), a standard local model on a client with only 'dogs' will confidently query more 'dogs', exacerbating the global imbalance. FairFAL detects this skew and switches strategies to prioritize the rare 'platypus' class.
Key Novelty
Adaptive Class-Fair Federated Active Learning (FairFAL)
  • Dynamically selects either the global or local model for querying based on real-time estimates of global imbalance and local-global distribution divergence
  • Uses global feature prototypes to assign pseudo-labels to unlabeled data, guiding the sampling process to specifically target minority classes
  • Refines queries using a two-stage strategy: first selecting high-uncertainty candidates per class, then enforcing diversity via gradient-embedding clustering
Breakthrough Assessment
7/10
Addresses a realistic and under-explored intersection of non-IID and long-tailed data in FAL. The adaptive model selection is a logical and empirically grounded contribution.
×