| Benchmark | Metric | Baseline | This Paper | ฮ |
|---|---|---|---|---|
| MedGraphRAG achieves State-of-the-Art (SOTA) accuracy on major medical Q&A benchmarks using GPT-4 as the base model. | ||||
| PubMedQA | Accuracy | 79.20 | 81.73 | +2.53 |
| MedQA | Accuracy | 81.08 | 82.35 | +1.27 |
| MedMCQA | Accuracy | 71.32 | 74.71 | +3.39 |
| In long-form generation tasks, MedGraphRAG produces more comprehensive and diverse answers compared to standard GraphRAG. | ||||
| Li's Dataset | Comprehensiveness | 47.92 | 68.64 | +20.72 |
| Li's Dataset | Diversity | 52.08 | 68.61 | +16.53 |
| Human evaluation by clinicians indicates significantly higher source utilization and slightly better usefulness. | ||||
| Internal | Source Utilization Rate | 29.27 | 63.82 | +34.55 |