← Back to Paper List

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

Chong Wang, Jian Zhang, Yebo Feng, Tianlin Li, Weisong Sun, Yang Liu, Xin Peng
Nanyang Technological University, Singapore, Fudan University, China
ACM Transactions on Software Engineering and Methodology (2024)
Agent Benchmark

📝 Paper Summary

Multi-call tool use with flexible plan Code Generation
ToolGen teaches Code LLMs to invoke static analysis-based autocompletion tools during generation by fine-tuning on functions augmented with special trigger tokens, resolving repository-level dependency errors.
Core Problem
Code LLMs generating repository-level code lack awareness of project-specific dependencies (user-defined attributes/functions), leading to undefined-variable and no-member errors.
Why it matters:
  • Over 70% of functions in real-world repositories are not standalone, making standard Code LLMs ineffective for practical software engineering.
  • Existing tool-use methods (like ToolFormer) struggle with repository dependencies because they rely on generic APIs rather than context-aware program analysis.
  • Dependency errors (e.g., hallucinating non-existent class members) significantly impede the usability of generated code.
Concrete Example: When generating a method using 'self.', a standard Code LLM might hallucinate an attribute '_updates' (causing a no-member error). In contrast, ToolGen invokes a static analysis tool (Jedi) at 'self.', which inspects the class definition and correctly suggests the existing attribute '_registered_updates'.
Key Novelty
ToolGen: Repository-Aware Tool Integration via Trigger Insertion
  • Fine-tunes Code LLMs to predict a special <COMP> token at specific positions where accessing repository dependencies (like class attributes), triggering an external autocompletion tool.
  • Integrates standard IDE-style static analysis tools (e.g., Jedi) directly into the LLM decoding loop to fetch valid identifiers from the project context.
  • Selects the best suggestion from the tool's list using a constrained greedy search by the LLM, bridging the gap between generative capability and strict repository constraints.
Architecture
Architecture Figure Figure 3
Overview of the ToolGen approach, split into Offline (Trigger Insertion & Fine-tuning) and Online (Tool-integrated Code Generation) phases.
Evaluation Highlights
  • Dependency Coverage (covering real repo dependencies) improved by 31.4% to 39.1% across CodeGPT, CodeT5, and CodeLlama compared to base models.
  • Static Validity Rate (passing dependency checks) increased by 44.9% to 57.7% on the 12,406 function benchmark.
  • Achieved 40.0% (CodeT5) and 25.0% (CodeLlama) improvement in Pass@1 on CoderEval tasks involving repository dependencies.
Breakthrough Assessment
7/10
Significantly addresses the specific problem of repository-level hallucinations by effectively combining LLMs with traditional static analysis, though the scope is limited to identifier completion.
×