| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Baseline analysis reveals that general purpose and medical LLMs struggle significantly with precision and overprescription compared to ground truth. | ||||
| MIMIC-III | F1 score | 0.3542 | Not reported in the paper | Not reported in the paper |
| MIMIC-III | #Med (Avg Recommendations) | 22.93 | Not reported in the paper | Not reported in the paper |
| Ablation studies confirm the necessity of processing clinical notes and using concise titles. | ||||
| MIMIC-III | Precision | Not reported in the paper | Not reported in the paper | Not reported in the paper |