CFP: Color Fundus Photography—standard 2D retinal imaging showing fundus structure
FFA: Fundus Fluorescein Angiography—imaging technique using dye to capture vascular changes in the retina
OCT: Optical Coherence Tomography—imaging technique providing cross-sectional views of retinal layers
KeepFIT V2: The proposed pretraining framework: Knowledge-Enhanced Pretraining for Fundus Image-Text V2
Elite Knowledge Spark: The concept of using a small, high-quality paired dataset to inject expert knowledge into a model trained primarily on coarser public data
Contrastive Learning: A learning method that aligns representations by pulling positive image-text pairs closer and pushing negative pairs apart
Generative Learning: A learning method where the model learns to generate text from images (or vice versa), forcing it to capture local details
VLP: Vision-Language Pretraining—training models on paired images and text to learn joint representations
MM-Retinal-Text: A large text-only dataset of ophthalmic knowledge constructed by the authors for pretraining the text encoder