FLOPs: Floating Point Operations—a measure of computational work. Training compute is approximated as 6 * N * D.
IsoFLOP: A curve or profile analyzing model performance while keeping the total computational budget (FLOPs) constant, varying only model size and token count.
MMLU: Massive Multitask Language Understanding—a benchmark covering 57 subjects like STEM, humanities, and social sciences.
Chinchilla: The 70B parameter model trained in this paper using the newly derived optimal scaling parameters.
Gopher: A 280B parameter model previously trained by DeepMind, used as the primary baseline for compute-budget comparisons.
Kaplan Scaling: Refers to the 2020 OpenAI paper proposing that model size should scale much faster than data (roughly N^0.73 vs D^0.27).
MassiveText: The large-scale text dataset used for training both Gopher and Chinchilla.
BPB: Bits Per Byte—a metric for evaluating language models, equivalent to loss normalized by the length of the text in bytes.