← Back to Paper List

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

R. Thapa, Bryan He, Magnus Ruud Kjær, IV HyattE.Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou
Stanford University
International Conference on Machine Learning (2024)
MM Pretraining Benchmark

📝 Paper Summary

Multi-modal Representation Learning Healthcare/Medical AI Time-series Analysis
SleepFM is the first multi-modal sleep foundation model, trained on 100,000+ hours of PSG data using a novel leave-one-out contrastive learning approach to integrate brain, cardiac, and respiratory signals.
Core Problem
Traditional sleep analysis relies on labor-intensive manual inspection or narrow supervised models that fail to leverage the full breadth of unlabeled physiological dynamics across diverse PSG sensors.
Why it matters:
  • Sleep monitoring is critical for diagnosing disorders and assessing overall brain, pulmonary, and cardiac health.
  • Existing supervised methods are limited by labeled data availability and do not utilize the rich, unlabelled relationships between different physiological modalities (brain, heart, lungs).
Concrete Example: A standard supervised CNN might classify sleep stages using only labeled EEG data, missing subtle correlations between heart rate variability (ECG) and breathing patterns (Respiratory) that indicate sleep-disordered breathing, leading to lower diagnostic accuracy.
Key Novelty
Leave-One-Out Contrastive Learning for Multi-modal Sleep Signals
  • Instead of just aligning pairs of signals (e.g., EEG vs ECG), the model trains one modality's embedding to predict the average embedding of all other remaining modalities.
  • This encourages each physiological signal (brain, heart, or lung) to capture global semantic information aligned with the entire holistic physiological state of the patient.
Architecture
Architecture Figure Figure 1
Schematic of the Contrastive Learning frameworks (Pairwise vs. Leave-one-out) used to train SleepFM.
Evaluation Highlights
  • SleepFM (logistic regression on embeddings) outperforms end-to-end supervised CNNs on sleep stage classification (AUROC 0.88 vs 0.72).
  • Achieves superior Sleep Disordered Breathing (SDB) detection compared to supervised CNNs (AUROC 0.85 vs 0.69).
  • Retrieves correct corresponding recording clips across modalities with 48% top-1 accuracy from 90,000 candidates (vs ~0.001% random chance).
Breakthrough Assessment
8/10
First comprehensive multi-modal foundation model for sleep using a massive real-world dataset (100k hours). The novel leave-one-out contrastive approach shows significant empirical gains over standard pairwise methods.
×