← Back to Paper List

RARL: Improving Medical VLM Reasoning and Generalization with Reinforcement Learning and LoRA under Data and Hardware Constraints

Tan-Hanh Pham, Chris Ngo
Harvard Medical School, USA, Knovel Engineering Lab, Singapore
arXiv.org (2025)
MM RL Reasoning QA Benchmark

📝 Paper Summary

Medical Vision-Language Models (VLMs) Efficient Fine-tuning Reinforcement Learning
RARL enhances the reasoning transparency and accuracy of small medical vision-language models on consumer hardware by combining efficient reinforcement learning with reasoning-specific rewards.
Core Problem
Medical VLMs typically require massive computational resources and large datasets, yet often fail to generalize to new clinical scenarios or provide transparent, step-by-step reasoning for their diagnoses.
Why it matters:
  • High computational costs (clusters of A100s) prevent deployment in smaller healthcare institutions or low-resource settings
  • Lack of transparent reasoning ('black box' answers) undermines clinical trust and accountability in high-stakes medical decision-making
  • Models trained on specific hospital data often fail when encountering different imaging protocols or demographics (poor generalization)
Concrete Example: A base model might correctly identify 'pneumonia' from an X-ray but fail to explain why, or hallucinate 'lung cancer' as a possibility without justification. Without explicit reasoning guidance, models memorize visual patterns ('bright spot = tumor') rather than understanding underlying pathological features.
Key Novelty
Reasoning-Aware Reinforcement Learning (RARL) with LoRA
  • Incentivizes the model to generate explicit 'thinking' steps (enclosed in tags) before answering, using a reward system that evaluates both reasoning quality and final answer correctness
  • Uses Group Relative Policy Optimization (GRPO) to train efficiently without a value network, combined with Low-Rank Adaptation (LoRA) to enable training on a single GPU
Evaluation Highlights
  • Outperforms supervised fine-tuning on reasoning-focused medical tasks by approximately 7.78% (human evaluation)
  • Achieves ~27% performance gain on unseen datasets (e.g., VQA-RAD) compared to supervised fine-tuning benchmarks
  • Demonstrates feasibility of training a reasoning-capable VLM on a single NVIDIA A100-40GB GPU
Breakthrough Assessment
8/10
Significant for demonstrating that high-quality medical reasoning doesn't require massive clusters; the single-GPU constraint makes advanced VLM capabilities accessible to resource-constrained clinical settings.
×