NLI: Natural Language Inference—determining if one sentence entails, contradicts, or is neutral towards another
RACE: Large-scale ReAding Comprehension Dataset From Examinations—a benchmark for question answering
GLUE: General Language Understanding Evaluation—a collection of resources for training, evaluating, and analyzing natural language understanding systems
Transformer: A neural network architecture based on self-attention mechanisms, processing sequences in parallel rather than sequentially
BPE: Byte Pair Encoding—a tokenization method that iteratively merges frequent pairs of characters or bytes to form subword units
SOTA: State of the Art—the current best performance on a specific task or benchmark
LSTM: Long Short-Term Memory—a type of recurrent neural network capable of learning order dependence in sequence prediction problems
Zero-shot: The ability of a model to perform a task without having seen any specific training examples for that task
GELU: Gaussian Error Linear Unit—an activation function used in neural networks
Perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance