← Back to Paper List

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, A. Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, J. Chen, Fereshteh Mahvar, L. Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, M. Asiedu, Ines Mezerreg, Howard H. Hu, Howard H. Yang, Richa Tiwari, et al.
Google Research, Google DeepMind
arXiv (2025)
MM Pretraining RL QA Benchmark Agent

📝 Paper Summary

Medical Vision-Language Models Foundation Models for Healthcare
MedGemma is a suite of open medical foundation models built on Gemma 3 that achieves state-of-the-art performance on medical tasks by combining a medically-tuned vision encoder (MedSigLIP) with specialized post-training.
Core Problem
General-purpose multimodal models often lack nuanced medical understanding and robust reasoning capabilities for diverse healthcare data types like radiology and histopathology.
Why it matters:
  • Generic models struggle with the specific vocabulary and visual patterns required for accurate diagnosis and treatment planning
  • Developing specialized models from scratch is resource-intensive; foundation models that require less task-specific tuning are critical for accelerating healthcare AI
  • Existing open models often lag behind closed proprietary models in specialized medical benchmarks
Concrete Example: In chest X-ray analysis, a generic model might identify a lung opacity but fail to distinguish between atelectasis and pneumonia, or fail to follow the specific reporting style required in clinical workflows (e.g., MIMIC-CXR standards).
Key Novelty
MedGemma (Medical Vision-Language Foundation Model)
  • Integrates a specialized medical vision encoder (MedSigLIP) into the Gemma 3 architecture to enhance visual discrimination of subtle medical features
  • Utilizes a comprehensive post-training pipeline including distillation from larger medical models and reinforcement learning on medical image-text pairs
  • Releases a standalone lightweight medical image encoder (MedSigLIP) that performs well on zero-shot classification and retrieval
Evaluation Highlights
  • +15.5-18.1% improvement on out-of-distribution chest X-ray finding classification compared to base Gemma models
  • Reduces errors in electronic health record (EHR) information retrieval by 50% after fine-tuning
  • MedGemma 4B outperforms significantly larger models like Med-Gemini 2D on VQA benchmarks like SLAKE and VQA-RAD
Breakthrough Assessment
8/10
Strong performance for open-weights models, particularly the 4B variant outperforming larger prior SOTA. The release of the standalone MedSigLIP encoder is a significant utility for the medical AI community.
×