CoT: Chain-of-Thought—a technique where models generate intermediate reasoning steps before the final answer.
SFT: Supervised Fine-Tuning—training a pre-trained model on a labeled dataset to follow specific instructions.
FOL: First-Order Logic—a formal system using quantifiers (forall, exists) and predicates to express logical relations.
Distillation: The process of training a smaller 'student' model to mimic the outputs or behavior of a larger 'teacher' model (here, GPT-4).
Zero-Shot-CoT: Prompting a model with 'Let's think step by step' without providing examples, to elicit reasoning.
MMLU: Massive Multitask Language Understanding—a benchmark covering 57 subjects like math, history, and law.
LogiEval: A benchmark suite specifically designed to test logical reasoning, comprising datasets like LogiQA, ReClor, and AR-LSAT.
MRC: Machine Reading Comprehension—tasks where the model answers questions based on a provided text passage.