MPP: Multiple Physics Pretraining—the proposed framework for training a single model on diverse physical systems
RevIN: Reversible Instance Normalization—a technique to normalize inputs by their mean/variance and denormalize outputs using the same statistics
Axial Attention: An attention mechanism that computes attention along specific axes (e.g., time, height, width) sequentially rather than all at once, reducing computational complexity
PDE: Partial Differential Equation—mathematical equations describing how physical quantities change over space and time
Surrogate Model: A fast approximation model (often a neural network) used to predict system behavior instead of running a slow, exact numerical simulation
Autoregressive: A prediction setup where the model predicts the next step in a sequence and feeds that prediction back as input for the following step
Gradient Accumulation: A training technique where gradients are calculated over multiple micro-batches before updating model weights, allowing for larger effective batch sizes
Spatio-temporal: Relating to both space and time