CoT: Chain-of-Thought—a prompting technique where the model is encouraged to generate intermediate reasoning steps before the final answer
Direct Answering: A prompting strategy where the model is instructed to output the final answer immediately without intermediate reasoning steps
MMLU: Massive Multitask Language Understanding—a broad benchmark covering STEM, humanities, and social sciences
GSM8K: Grade School Math 8K—a benchmark of grade-school level mathematics word problems
Symbolic Reasoning: Problems grounded in a formal system (e.g., math, logic, code) where a symbolic expression can be derived and solved
Soft Reasoning: Problems relying on commonsense or natural language inference where no formal logical system or strict ruleset exists to derive the answer
vLLM: A high-throughput and memory-efficient inference engine for LLMs
SFT: Supervised Fine-Tuning