MWP: Math Word Problems—mathematical questions presented as natural language narratives rather than pure equations
ATP: Automated Theorem Proving—autonomous construction of logical proofs for mathematical conjectures
Chain-of-Thought: A prompting technique that encourages the model to generate intermediate reasoning steps before the final answer
Program-of-Thought: A method where the LLM generates executable code (like Python) to solve the reasoning steps of a math problem, separating reasoning from computation
TabMWP: A dataset for math word problems requiring reasoning over tabular data contexts (tables, images, or structured text)
Geometry: Problems requiring spatial understanding of shapes, sizes, and interrelationships, often involving visual or symbolic inputs
Fine-tuning: Adjusting the parameters of a pre-trained model on a specific dataset (e.g., math problems) to improve performance
Python REPL: An interactive shell (Read-Eval-Print Loop) that allows the LLM to execute code snippets to verify calculations