← Back to Paper List

CodeNav: Beyond tool-use to using real-world codebases with LLM agents

Tanmay Gupta, Luca Weihs, Aniruddha Kembhavi
PRIOR (Allen Institute for AI)
arXiv (2024)
Agent RAG Benchmark MM

📝 Paper Summary

Code generation Agentic tool use
CodeNav is an LLM agent that autonomously indexes, searches, and executes code from unseen repositories to solve user queries without requiring manual tool registration.
Core Problem
Standard tool-use requires meticulous manual registration of tools (descriptions/examples) and limits LLMs to a small set of functions, preventing them from leveraging full real-world codebases.
Why it matters:
  • Current methods constrain LLM expressiveness to a handful of pre-defined API calls rather than the vast functionality available in existing libraries
  • Scaling tool-use is difficult because manual description and registration of every function in a large codebase is impractical and exceeds context windows
  • Existing retrieval methods usually retrieve documentation, which may be imprecise or outdated compared to the actual source code
Concrete Example: A user asks to detect dogs in an image using the `transformers` library. A standard tool-use agent fails if the specific object detection pipeline isn't pre-registered. CodeNav searches the repository for `ObjectDetection`, imports the relevant classes, instantiates the model `facebook/detr-resnet-101`, and iteratively fixes execution errors to produce the result.
Key Novelty
Code-Use Paradigm (vs. Tool-Use)
  • Moves beyond 'registered' tools to 'code-use' where the agent indexes and searches the raw codebase (functions, classes) directly using Elasticsearch
  • Empowers the agent to define its own tools on the fly by importing and executing code found in the repository, rather than calling pre-defined APIs
  • Utilizes a multi-environment framework (Retrieval, Execution) with stateful memory to iteratively search, write code, and correct errors based on execution feedback
Architecture
Architecture Figure Figure 1
The CodeNav interaction framework showing the agent loop with Retrieval and Execution environments.
Evaluation Highlights
  • Achieves 47.9% success rate on m&m's benchmark, comparable to the Oracle Tool-Use upper bound (51.2%) that uses privileged, hand-crafted tool info
  • Outperforms Tool-Use (without oracle descriptions) on API-Bank (Level-1) with 73.2% vs 66.8% accuracy
  • Retrieving actual source code improves performance by ~4.5% compared to retrieving only function signatures/docstrings on the m&m's benchmark
Breakthrough Assessment
8/10
Strong shift from restrictive tool registration to open-ended codebase navigation. Competitive with oracle baselines without manual overhead is significant, though currently evaluated on standard tool-use benchmarks rather than massive repositories.
×