| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| MedCollab demonstrates superior diagnostic accuracy and department routing on ClinicalBench compared to the strongest baselines. | ||||
| ClinicalBench | Accuracy (ACC) | 68.7 | 76.9 | +8.2 |
| ClinicalBench | Comprehensive Diagnostic Rate (CDR) | 59.3 | 72.4 | +13.1 |
| MIMIC-IV | Accuracy (ACC) | Not reported in the paper | 57.7 | Not reported in the paper |
| Ablation studies reveal the critical importance of the Logic Auditing and Causal Chain components. | ||||
| ClinicalBench | Accuracy (ACC) | 76.9 | 49.7 | -27.2 |
| ClinicalBench | Accuracy (ACC) | 76.9 | 52.9 | -24.0 |
| ClinicalBench | RaTEScore (Diagnostic Basis) | 62.0 | 51.7 | -10.3 |