Top@1: Accuracy metric checking if the primary (first-ranked) diagnosis matches the gold standard.
Top@4: Accuracy metric checking if the gold standard diagnosis appears anywhere in the top 4 predicted diagnoses.
Differential Diagnosis (DDx): The process of differentiating between two or more conditions which share similar signs or symptoms.
Serial Collaboration: A workflow where the physician diagnoses independently first, then reviews AI suggestions to revise their decision.
Concurrent Collaboration: A workflow where the physician receives AI assistance (co-pilot) simultaneously while reviewing the case material.
Automation Bias: The tendency of humans to over-rely on automated systems, potentially accepting incorrect AI suggestions.
Incidence Tiers: Categorization of diseases based on their frequency in the population (e.g., Common, Rare, Ultra-rare).
Reasoning-oriented LLM: An LLM optimized to generate intermediate reasoning steps (chains of thought) before producing a final answer.
GPT-5.1: A hypothetical or future model version referenced in the paper as an automated evaluator for semantic consistency.