← Back to Paper List

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Wenfang Yao, Kejing Yin, William K. Cheung, Jia Liu, Jing Qin
AAAI Conference on Artificial Intelligence (2024)
MM Benchmark

📝 Paper Summary

Clinical Multi-Modal Learning Missing Modality Imputation Disentangled Representation Learning
DrFuse improves clinical prediction by disentangling modality-shared from modality-distinct features to handle missing data and using disease-aware attention to resolve conflicting modal information.
Core Problem
Clinical multi-modal learning faces two key challenges: frequent missing modalities (e.g., lack of X-rays) and inconsistent or contradictory signals between EHR and imaging data.
Why it matters:
  • In real-world datasets like MIMIC-IV, less than 20% of patients have X-ray images, rendering standard fusion invalid.
  • EHR and images can provide contradictory risk signals (e.g., meningitis symptoms in EHR vs. clear X-ray), causing confusion for standard models.
  • The diagnostic importance of each modality varies significantly depending on the specific patient and disease target.
Concrete Example: In mortality prediction, a patient with meningitis might show high risk in EHR data due to symptoms, while their Chest X-ray (CXR) appears normal. A standard fusion model might average these or get confused, whereas DrFuse learns to weigh the EHR higher for this specific disease context via attention ranking.
Key Novelty
Disentangled Representation with Disease-Aware Attention
  • Decomposes inputs into 'shared' (common to both EHR/CXR) and 'distinct' (unique to one) representations to robustly handle missing views.
  • Aligns shared representations via Jensen-Shannon Divergence minimization so the shared component can be inferred even if one modality is missing.
  • Uses a margin ranking loss to force the model to pay more attention to the modality that is more accurate for the specific disease being predicted.
Architecture
Architecture Figure Figure 1
Overview of DrFuse framework showing the parallel encoding of EHR and CXR, the extraction of shared/distinct features, and the fusion mechanism.
Breakthrough Assessment
7/10
Addresses the critical and under-explored issue of modal inconsistency in clinical data. The disentanglement approach for missing data is theoretically sound, though quantitative results are not provided in the snippet.
×