WSI: Whole Slide Image—a high-resolution (gigapixel) digital scan of a tissue slide used in pathology
Tile: A small, fixed-size square crop (e.g., 224x224 pixels) extracted from a massive WSI for processing
Virchow: A specific tile-level foundation model (ViT-H/14) pre-trained on 1.5 million slides, used here to encode tiles
Perceiver: A neural network architecture designed to handle very long inputs (like thousands of tiles) by mapping them to a smaller, fixed number of latent variables
CoCa: Contrastive Captioners—a training framework combining contrastive loss (matching images to text) and generative loss (generating text from images)
BioGPT: A generative language model pre-trained on biomedical literature, used here as the text decoder
MIL: Multiple Instance Learning—a learning paradigm where a label is assigned to a bag of instances (tiles) rather than individual instances
Zero-shot: Making predictions on a new task (e.g., cancer detection) using only the pre-trained model and text prompts, without updating any model weights
Linear probing: Training a simple linear classifier on top of frozen model embeddings to evaluate the quality of the learned features
IHC: Immunohistochemistry—a staining process used to detect specific antigens (proteins) in cells, often used as ground truth for biomarker tasks
DCIS: Ductal Carcinoma In Situ—a pre-invasive cancerous lesion of the breast
NSCLC: Non-Small Cell Lung Cancer—the most common type of lung cancer