AST sub-tree matching: An evaluation method that parses generated code into a tree structure and checks if the relevant function call and arguments exist as a sub-tree of the reference solution, ignoring irrelevant formatting.
Self-Instruct: A framework for generating synthetic instruction-following data by prompting a strong LLM (like GPT-4) with a few seed examples to create diverse task instances.
Hallucination (API): When an LLM generates an API call that does not exist in the database or uses a non-existent library, distinct from using an existing API incorrectly.
APIBench: A dataset constructed in this paper containing over 11,000 {instruction, API} pairs derived from TorchHub, TensorHub, and HuggingFace model cards.
Retriever-Aware training: Training the LLM with the retrieved API documentation explicitly included in the prompt, teaching it to rely on the provided context rather than memorized knowledge.
Zero-shot (in this context): Providing the LLM with only the user prompt and no retrieved documentation or in-context examples during inference.