JEPA: Joint-Embedding Predictive Architecture—a self-supervised framework where a predictor tries to predict the embeddings of masked regions based on context embeddings.
I-JEPA: Image-based JEPA—a specific JEPA variant operating on image patches.
Equivariance: A property where transforming an input image (e.g., rotating it) results in a corresponding predictable transformation in the embedding space.
MRR: Mean Reciprocal Rank—a metric used here to evaluate equivariance by ranking the correct geometric transformation among a set of augmented embeddings based on cosine similarity.
Patch Embeddings: Vector representations of small, fixed-size square regions of an image (patches), produced by Vision Transformers.
HLS: Harmonized Landsat-Sentinel—a dataset combining imagery from Landsat and Sentinel satellites, commonly used for Earth observation.
Prithvi-EO-2.0: A specific foundational model for Earth observation data based on the MAE architecture.
CLS token: Classification token—a special token prepended to the input sequence in Transformers to aggregate global image information.