| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Safety and feasibility results demonstrate the system is viable for real-world deployment with supervision. | ||||
| Clinical Feasibility Cohort | Safety Stops | 0 | 0 | 0 |
| Diagnostic accuracy results show high concordance with ground truth derived from chart review. | ||||
| Clinical Feasibility Cohort | Inclusion of Final Diagnosis | Not reported in the paper | 90 | Not reported in the paper |
| Clinical Feasibility Cohort | Top-3 Accuracy | Not reported in the paper | 75 | Not reported in the paper |
| Comparative ratings between AMIE and PCPs (blinded evaluators) reveal trade-offs in management planning. | ||||
| Clinical Feasibility Cohort | DDx Quality (p-value) | 0.05 | 0.6 | Not applicable |
| Clinical Feasibility Cohort | Mx Practicality (p-value) | 0.05 | 0.003 | Not applicable |
| Clinical Feasibility Cohort | Mx Cost Effectiveness (p-value) | 0.05 | 0.004 | Not applicable |