CoT: Chain-of-Thought—a prompting technique where models generate step-by-step natural language reasoning before answering
SoT: Sketch-of-Thought—the proposed framework using concise, structured reasoning sketches instead of verbose sentences
DistilBERT: A small, fast, cheap version of the BERT language model, used here as a router to classify questions
FlashAttention2: An algorithm that speeds up the attention mechanism in Transformers by reducing memory access overhead
Conceptual Chaining: A reasoning paradigm based on associative memory that links concepts via short pathways (e.g., Rain -> Umbrella)
Chunked Symbolism: A reasoning paradigm based on working memory that uses mathematical notation to compress logic (e.g., Var1 + Var2)
Expert Lexicons: A reasoning paradigm using domain-specific jargon and acronyms to compress technical reasoning
LLM-as-a-judge: Using a strong LLM (like GPT-4o) to evaluate the correctness of open-ended responses from other models