FABRIC: A simplified name for the NSF-funded nationwide research infrastructure offering programmable compute and networking for academic users
SLM: Small Language Model—models with fewer parameters (e.g., GPT-2, 125M-700M params) compared to LLMs, suitable for academic budgets
Pipeshard: A parallelism strategy from Alpa that combines pipeline parallelism (inter-operator) with shard parallelism (intra-operator) to optimize communication
Data Parallelism: A training strategy where the model is replicated on every GPU and gradients are synchronized (averaged) after each step
ZeRO: Zero Redundancy Optimizer—a method to reduce memory usage in data parallelism by partitioning optimizer states across GPUs
Shard Parallelism: Intra-operator parallelism where individual tensors/operators are partitioned across devices (similar to Megatron-LM tensor parallelism)
TFLOP/s: Tera Floating Point Operations Per Second—a measure of computer performance/training throughput
L2STS: Layer 2 Site-to-Site Connection Service—a FABRIC network service connecting VMs between different geographic sites
L2Bridge: Layer 2 Bridge Service—a FABRIC network service connecting VMs within a single site
Microbatches: Small chunks of a training batch used in pipeline parallelism to overlap computation and communication