← Back to Paper List

A co-evolving agentic AI system for medical imaging analysis

Songhao Li, Jonathan Xu, Tiancheng Bao, Yuxuan Liu, Yucheng Liu, Yihang Liu, Lilin Wang, Wenhui Lei, Sheng Wang, Yinuo Xu, Yan Cui, Jialu Yao, Shunsuke Koga, Zhi Huang
University of Pennsylvania
arXiv.org (2025)
Agent MM RAG Factuality Benchmark

📝 Paper Summary

Medical Agentic AI Tool-use and Workflow Planning
TissueLab is an agentic AI system that orchestrates specialized medical tools to build executable imaging workflows, utilizing clinician feedback and active learning to refine lightweight models in real-time.
Core Problem
Medical image analysis requires highly specialized, manually constructed pipelines that VLMs cannot generate reliably due to hallucination, lack of specific quantification tools, and inability to adapt to new disease morphologies without retraining.
Why it matters:
  • Clinicians rely on precise quantifications (e.g., tumor-to-duct ratio) for staging and treatment, which general VLMs fail to provide accurately
  • Existing agentic systems rely on fixed toolboxes that become obsolete and lack mechanisms for real-time expert refinement or preference retention
  • The gap between AI research and clinical adoption is widened by 'black box' models that cannot be inspected, corrected, or grounded in authoritative guidelines
Concrete Example: When asked to calculate tumor invasion depth, GPT-4o-vision produces a hallucinatory estimate with poor correlation (Pearson ρ=0.37) because it cannot perform precise geometric measurement. TissueLab constructs a workflow to segment tissue, extract contours, and compute the exact distance, achieving expert-level correlation (Pearson ρ=0.843).
Key Novelty
Co-evolving Agentic Ecosystem (TissueLab)
  • Modules are 'co-evolving': clinician feedback on intermediate results (e.g., segmentation errors) is immediately converted into training data for lightweight model fine-tuning via active learning
  • Combines LLM orchestration with a 'Factory Method' architecture where diverse specialized models are wrapped as standardized plugins, enabling dynamic workflow planning
  • Integrates the Model Context Protocol (MCP) to retrieve live, authoritative clinical guidelines (e.g., AJCC) to ground diagnostic reasoning in external standards rather than model weights
Architecture
Architecture Figure Figure 1
The TissueLab ecosystem architecture, illustrating the flow from user query to tool selection, workflow generation, distributed inference, and feedback loops.
Evaluation Highlights
  • Achieved 99.8% accuracy in prostate tumor-to-duct ratio measurement after 2 minutes of active learning feedback, compared to <12% for most baselines
  • Attained 0.843 Pearson correlation with expert annotations for tumor invasion depth, significantly outperforming GPT-4o-agent (0.376)
  • Raised mean AUC from 0.6959 to 0.8284 on NIH Chest X-ray classification by leveraging candidate pooling and clinician preference updates
Breakthrough Assessment
9/10
A major step forward in medical agents, moving beyond simple VLM prompting to a system that builds executable code workflows, integrates real-time active learning, and grounds decisions in retrieval-augmented guidelines.
×