LATM: LLMs As Tool Makers—the proposed framework where models generate their own tools to solve tasks.
PbE: Programming by Example—a paradigm where the model generates a program based on input-output examples.
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer.
Functional Cache: A caching mechanism that stores reusable tools (logic) capable of processing new inputs, unlike traditional caches that store static text responses.
Dispatcher: A lightweight model component that decides whether an incoming request can be solved by an existing tool or requires a new tool to be made.
BigBench: A collaborative benchmark for measuring the capabilities of large language models across diverse tasks.
Dyck Language: A task involving checking the correct nesting of brackets/parentheses, often used to test recursive reasoning.
Tool Maker: The powerful LLM (e.g., GPT-4) responsible for generating, verifying, and wrapping the Python tool.
Tool User: The lightweight LLM (e.g., GPT-3.5) responsible for calling the pre-made tool to solve specific instances.