Phenopackets: A standardized open format for sharing disease and phenotype information, used here to unify diverse patient data sources
RAG: Retrieval-Augmented Generation—combining information retrieval with LLM generation to ground answers in specific data
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
NER: Named Entity Recognition—identifying specific terms like diseases, genes, or drugs within unstructured text
BM25: A ranking function used in information retrieval to estimate the relevance of documents to a given search query based on term frequency
BGE-M3: A dense embedding model used to convert text into vector representations for semantic search
nDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that prioritizes relevant items appearing earlier in the list
BioSyn: A method for biomedical entity normalization, mapping text mentions to standard ontology concepts