| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on WSI-Bench Report Generation and Diagnosis tasks. | ||||
| WSI-Bench (Report Generation) | WSI-Precision | Not reported in the paper | Not reported in the paper | +12.8% |
| WSI-Bench (Report Generation) | WSI-Relevance | Not reported in the paper | Not reported in the paper | +10.1% |
| WSI-Bench (Open-ended diagnosis) | Accuracy/Score | Not reported in the paper | Not reported in the paper | +9.7% |
| WSI-Bench (Open-ended diagnosis - Relevance) | Relevance Score | Not reported in the paper | Not reported in the paper | +8.9% |