Tool-use: A paradigm where LLMs invoke pre-defined external functions ('tools') that must be manually described and registered in the context
Code-use: A proposed paradigm where LLMs directly search, import, and execute source code from a repository without manual tool registration
Elasticsearch: A distributed search and analytics engine used here to index code snippets (classes, functions) for retrieval
Chain of Thought: A prompting technique where the model generates intermediate reasoning steps ('thoughts') before producing a final answer or action
ReAct: Reasoning + Acting; a paradigm where LLMs interleave reasoning traces with actions in an external environment
Docstrings: String literals specified in source code that describe a function's or class's purpose, often used for documentation
Linting: Static code analysis to flag programming errors, bugs, stylistic errors, and suspicious constructs (e.g., using flake8)
Oracle Tool-Use: An upper-bound baseline where the agent is provided with perfect, hand-crafted descriptions of the exact tools needed to solve the task
Pass@1: A metric measuring the percentage of problems where the model's first generated solution is correct
Success Rate: The percentage of evaluation episodes that successfully complete the user's task