Tokeniser: A neural network (usually an autoencoder) that compresses high-dimensional data into compact latent representations (tokens) for a downstream model to process
VRMSE: Variance-Normalised Root Mean Squared Error—a metric measuring reconstruction error relative to the natural variability of the target field
NEPS: Normalised Error Power Spectrum—a frequency-domain metric measuring the ratio of error power to signal power at specific spatial scales (wavenumbers)
Rollout: The process of generating a sequence of future predictions autoregressively, where each prediction is fed back as input for the next step
FSDP: Fully Sharded Data Parallel—a memory-optimization technique for distributed training that shards model parameters across GPUs
DDP: Distributed Data Parallel—a parallel training technique where each process has a model copy and gradients are synchronized
SOAP: A specific optimizer used for pretraining the tokeniser in this paper
Causal convolution: Convolution operations that only use information from past and present time steps, preserving the temporal order required for autoregressive tasks