CLIP: Contrastive Language-Image Pre-training—a model that learns to associate images and text by maximizing similarity between correct pairs
MAE: Masked Autoencoder—a vision model that learns by reconstructing missing parts of an image
CFP: Color Fundus Photography—a common 2D imaging technique for the retina
OCT: Optical Coherence Tomography—a non-invasive imaging test that uses light waves to take cross-section pictures of the retina
FFA: Fundus Fluorescein Angiography—a diagnostic procedure using dye to examine blood circulation in the retina
AUROC: Area Under the Receiver Operating Characteristic curve—a performance metric for classification problems at various threshold settings
Recall@K: A retrieval metric measuring if the correct item appears in the top K returned results
Zero-shot: Testing a model on a task it was not explicitly trained for, often using class names as text prompts
Few-shot: Training a model with very few labeled examples per class (e.g., 1 to 16)
VQA: Visual Question Answering—a task where the model answers natural language questions about an image