MathFunc: The novel training corpus constructed in this paper, containing ~30k samples of questions, plans, retrieved tools, and code-integrated solutions
SciToolBench: A new benchmark dataset created by the authors covering 5 scientific domains (Math, Physics, Chemistry, EECS, Finance) with domain-specific toolsets
Ad-hoc functions: Functions generated specifically for a single problem instance (e.g., hardcoding numbers) rather than being general-purpose reusable tools
Cross-retrieval: A data construction strategy where the solution for problem A is forced to use tools generated for problem B (or others), ensuring the model learns to select generalized tools rather than cheat with ad-hoc ones
Program-of-Thought (PoT): A reasoning format where the model generates executable code (Python) as the reasoning steps instead of just text
Rationale: Natural language explanation generated before or alongside code to explain the reasoning process