← Back to Paper List

Adaptive Clinical-Aware Latent Diffusion for Multimodal Brain Image Generation and Missing Modality Imputation

Rong Zhou, Houliang Zhou, Yao Su, Brian Y. Chen, Yu Zhang, Lifang He, Alzheimer's Disease Neuroimaging Initiative
arXiv (2026)
MM Benchmark

📝 Paper Summary

Medical Image Synthesis Multimodal Learning Missing Data Imputation
ACADiff synthesizes missing brain imaging modalities by conditioning a latent diffusion model on available scans via adaptive fusion and clinical metadata encoded by GPT-4o.
Core Problem
Multimodal Alzheimer's disease datasets frequently suffer from missing imaging modalities (MRI, PET) due to cost or patient dropout, which limits diagnostic accuracy and research utility.
Why it matters:
  • Single modalities (like just MRI) miss complementary pathological markers available in PET scans (glucose metabolism, amyloid deposition)
  • Existing generative methods (GANs, standard diffusion) lack adaptive mechanisms to handle varying combinations of missing inputs (e.g., handling both 1→1 and 2→1 generation with one model)
  • Current approaches rarely integrate rich semantic clinical metadata (diagnosis, cognitive scores) to guide the generation process toward biologically plausible results
Concrete Example: A patient may have an MRI and clinical scores but lack FDG-PET due to cost. A standard diffusion model might generate a generic PET scan that looks realistic but ignores the patient's specific cognitive decline (e.g., MMSE=22), whereas ACADiff uses the clinical scores to generate a PET scan reflecting the appropriate hypometabolism patterns.
Key Novelty
Adaptive Clinical-Aware Diffusion (ACADiff)
  • Adaptive Image Conditioning: A dynamic fusion module switches between cross-attention (for multiple inputs) and projection (for single input) based on availability masks, allowing one model to handle any combination of missing modalities.
  • Semantic Clinical Guidance: Instead of simple class labels, patient metadata (MMSE, ADAS13) is converted into natural language prompts and encoded by GPT-4o to guide the diffusion process toward disease-consistent patterns.
  • Hierarchical Conditioning: Combines early fusion of image features with late fusion of text embeddings in the diffusion U-Net to preserve both structural details and high-level semantic disease information.
Architecture
Architecture Figure Fig. 1
The ACADiff framework illustrating the latent diffusion process with hierarchical adaptive conditioning.
Evaluation Highlights
  • Achieves 89.4% diagnostic accuracy with 20% missing data (97.2% of the performance of complete real data).
  • Maintains 77.5% diagnostic accuracy even under extreme 80% missing data scenarios, outperforming the best baseline (LDM) which achieves 76.4%.
  • Outperforms state-of-the-art baselines (LDM, PASTA, Pix2Pix) across all generation metrics, achieving PSNR of 27.9 and SSIM of 0.911.
Breakthrough Assessment
8/10
Strong practical contribution solving the specific problem of variable missing modalities in medical imaging. The integration of LLM-encoded clinical text for guidance is a novel and effective addition to standard latent diffusion.
×