← Back to Paper List

Segment anything in medical images

Jun Ma, Yuting He, Feifei Li, Li-Jun Han, Chenyu You, Bo Wang
Peter Munk Cardiac Centre, University Health Network, Toronto, Canada, Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada, Vector Institute, Toronto, Canada, Department of Computer Science, Western University, Canada, Tandon School of Engineering, New York University, USA, Department of Electrical Engineering, Yale University, USA
Nature Communications (2023)
MM Pretraining Benchmark

📝 Paper Summary

Medical Image Segmentation Foundation Models
MedSAM is a universal medical image segmentation model built by fine-tuning the Segment Anything Model (SAM) on a massive dataset of over one million image-mask pairs across 10 modalities.
Core Problem
Existing medical segmentation models are task-specific, lacking generalization across the diverse spectrum of medical modalities and targets, while the natural image foundation model (SAM) performs poorly on medical targets with weak boundaries.
Why it matters:
  • Task-specific models require training separate networks for every new organ or disease, which is inefficient and scales poorly.
  • Clinical workflows involve diverse imaging (CT, MRI, Ultrasound) and variable targets (tumors, organs), requiring a universal tool rather than fragmented solutions.
  • Out-of-the-box SAM struggles with low-contrast medical targets, limiting its immediate utility in high-stakes clinical diagnosis.
Concrete Example: When applied to a liver tumor in a CT scan, standard SAM often fails due to weak boundaries between the tumor and surrounding tissue. In contrast, MedSAM, fine-tuned on medical data, accurately delineates the tumor given the same bounding box prompt.
Key Novelty
MedSAM (Medical Segment Anything Model)
  • Adapts the Segment Anything Model (SAM) to the medical domain by fine-tuning the mask decoder on a curated large-scale medical dataset while freezing the image encoder.
  • Consolidates over 1.5 million image-mask pairs from diverse public datasets into a unified format to enable universal training across 10 distinct imaging modalities.
  • Utilizes a prompt-based approach (bounding boxes) to handle the ambiguity of medical segmentation tasks (e.g., segmenting a whole organ vs. a specific tumor) within a single model.
Evaluation Highlights
  • Outperforms the specialist U-Net model by 15.5% on the unseen external validation task of nasopharynx cancer segmentation (Median DSC: 87.8% vs 72.3%).
  • Surpasses standard SAM by a large margin on challenging internal tasks, such as liver tumor segmentation (Median DSC improvement ~30-40% visually estimated from plots).
  • Reduces 3D tumor annotation time by 82.37% for human experts compared to slice-by-slice manual segmentation.
Breakthrough Assessment
9/10
Establishment of the first large-scale, universal foundation model for medical segmentation. It demonstrates that a single model can rival or beat specialist models across diverse modalities, representing a paradigm shift from task-specific training.
×