VQA: Visual Question Answering—answering natural language questions about an image
LLaVA-Med: A large language and vision assistant specifically trained on biomedical images and text
Grounding DINO: An open-set object detector that can identify objects based on text descriptions
SAM: Segment Anything Model—a foundation model for image segmentation
MedSAM: A version of SAM fine-tuned for medical images
RAG: Retrieval-Augmented Generation—enhancing model responses by retrieving relevant information from external knowledge bases
IoU: Intersection over Union—a metric for measuring the overlap between a predicted bounding box/mask and the ground truth
Dice score: A metric used to gauge the similarity of two samples (often binary masks in segmentation)
RadFM: A generalist foundation model for radiology
BiomedCLIP: A vision-language foundation model pre-trained on biomedical literature and images
MIMIC-CXR: A large publicly available dataset of chest radiographs with radiology reports
grounding: Identifying and localizing specific objects within an image (often with bounding boxes)
CIDEr: Consensus-based Image Description Evaluation—a metric for evaluating image captioning quality
BLEU: Bilingual Evaluation Understudy—a metric for evaluating machine-translated text against reference text
ROUGE-L: Recall-Oriented Understudy for Gisting Evaluation (Longest Common Subsequence)—metric for evaluating text summarization