LMulator: A mechanism that uses a Language Model to simulate the execution of code that a standard interpreter cannot run (e.g., semantic functions)
Chain of Thought (CoT): A prompting technique where the model generates intermediate natural language reasoning steps before the final answer
Program of Thoughts (PoT): A technique where the model generates executable code to solve the problem, relying entirely on an external interpreter
ScratchPad: A prompting method where the model maintains an intermediate program state trace to simulate execution
BIG-Bench Hard (BBH): A subset of the BIG-Bench benchmark containing 23 challenging tasks where LLMs previously struggled to beat average human performance
Interweave: The execution mode where control switches back and forth between the Python interpreter (for valid code) and the LLM (for semantic/undefined code) line-by-line
Try-Except: Python error handling mechanism used here to catch undefined semantic functions and trigger the LMulator
System 1 vs System 2: Cognitive science terms often applied to AI: System 1 is fast/intuitive (LLM semantic prediction), System 2 is slow/deliberate (Code execution)
Semantic reasoning: Tasks requiring understanding of meaning, nuance, or common sense (e.g., 'is this movie review positive?')
Algorithmic reasoning: Tasks requiring strict logic, math, or symbolic manipulation (e.g., 'sort this list of 50 numbers')