← Back to Paper List

A Survey of DeepSeek Models

Fnu Neha, Deepshikha Bhati
Weill Cornell Medicine, Cornell University, New York, New York, USA, Department of Engineering, Johns Hopkins University, Baltimore, Maryland, USA, Touro University College of Osteopathic Medicine, Middletown, New York, USA
2025 12th International Conference on Soft Computing & Machine Intelligence (ISCMI) (2025)
Reasoning RL Benchmark Factuality QA

📝 Paper Summary

Clinical Decision Support Open-Source LLM Evaluation AI Safety and Ethics
DeepSeek-R1 offers a cost-effective, open-source alternative for healthcare reasoning through a hybrid architecture, achieving near-proprietary performance in diagnostics while exhibiting higher risks of bias and adversarial vulnerability.
Core Problem
Proprietary healthcare AI models (like GPT-4o) are expensive and opaque, while existing open-source models often lack the reasoning depth required for complex clinical tasks.
Why it matters:
  • High computational costs and licensing fees restrict AI adoption in resource-constrained healthcare settings and developing regions
  • Closed-source 'black box' models prevent clinical auditing and transparency, which are critical for patient safety and regulatory compliance
  • A lack of accessible, high-reasoning models hinders the democratization of medical AI tools for diagnosis and education
Concrete Example: In ophthalmology, proprietary models like GPT-o1 achieve high accuracy but at high cost. DeepSeek-R1 achieves comparable diagnostic accuracy (82.0%) but is 15 times cheaper to run, potentially enabling wider deployment in rural clinics.
Key Novelty
Hybrid Reasoning-Reinforcement Architecture (DeepSeek-R1)
  • Integrates Mixture of Experts (MoE) to selectively activate neural pathways, drastically reducing inference costs while maintaining high parameter capacity
  • Employs Group Relative Policy Optimization (GRPO) in reinforcement learning to induce 'self-reflection' capabilities, allowing the model to critique and refine its own reasoning chain
Evaluation Highlights
  • Achieves 86.7% accuracy on AIME 2024 (mathematics benchmark) and 96.3rd percentile on Codeforces, rivaling proprietary models
  • Matches OpenAI's o1 model with 82.0% accuracy on ophthalmology cases while incurring ~15x lower inference costs
  • Performs strongly in pediatric diagnostics (87.0% on MedQA), though slightly trailing ChatGPT-o1 (92.8%)
Breakthrough Assessment
7/10
Significant for democratizing high-level reasoning in healthcare via open weights and efficiency, but safety vulnerabilities and lower general NLP fluency prevent a higher score.
×