GSD: Ground Sampling Distance—the distance between pixel centers measured on the ground (e.g., 0.1m/pixel means each pixel represents 10cm), determining image resolution
OSM: OpenStreetMap—an open, crowdsourced geographic database where map features are described by 'tags' (key-value pairs like 'building=residential')
GEE: Google Earth Engine—a cloud computing platform for processing satellite imagery and other earth observation data
CLIP: Contrastive Language-Image Pre-training—a model trained to align images and text in a shared embedding space, enabling zero-shot classification
VLM: Vision-Language Model—a model that processes and relates both image and text inputs
Visual Groundability: The property of a semantic concept being visually identifiable in an image at a specific resolution (e.g., a 'stream' is visible, a 'house number' is not)
Zero-shot transfer: The ability of a model to perform a task (like classifying a new scene type) without having been explicitly trained on examples of that specific task