← Back to Paper List

Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum

Shen Gao, Zhengliang Shi, Minghang Zhu, Bowen Fang, Xin Xin, Pengjie Ren, Zhumin Chen, Jun Ma
Shandong University
AAAI Conference on Artificial Intelligence (2023)
Agent Reasoning

📝 Paper Summary

Tool-use post-training Curriculum learning for agents Self-evolving Agentic reasoning
CTL trains LLMs to use tools via a multi-stage curriculum (easy-to-difficult) and an iterative self-instruction process that dynamically generates training data based on model introspection of past errors.
Core Problem
Existing tool-learning methods often train on limited, simple toolsets using static self-instruction, failing to generalize to complex real-world scenarios requiring selection from massive tool libraries.
Why it matters:
  • Real-world applications involve thousands of tools, requiring models to distinguish between relevant and irrelevant candidates
  • Tool complexity varies significantly; simple static datasets fail to capture the nuance needed for complicated tools (e.g., navigation vs. simple search)
  • Standard self-instruction lacks feedback mechanisms, leading models to overfit simple tools while failing to master intricate ones
Concrete Example: A Google Map tool might only need coordinates for 'exploring', but requires start/end points and preferences for 'planning a commute'. A model trained only on simple cases fails to provide the necessary parameters for the complex case.
Key Novelty
Curriculum Tool Learning (CTL) with Iterative Self-instruction from Introspective Feedback (ISIF)
  • Decomposes training into three stages (Warm-up, In-category, Cross-category) to gradually increase difficulty from simple execution to complex selection from large libraries
  • Uses an iterative feedback loop where the model 'introspects' on its own failures to generate new, targeted training examples for tools it currently struggles with, rather than random sampling
Evaluation Highlights
  • Outperforms ChatGPT (tuning-free) by +9.2% success rate on unseen tools in ToolBench
  • Surpasses GPT4Tools (tuning-based) by +13.5% success rate on unseen instructions
  • Achieves comparable performance to ChatGPT on unseen datasets while using a much smaller open-source backbone (e.g., LLaMA-7B)
Breakthrough Assessment
7/10
Strong methodological contribution in curriculum design and dynamic data generation for tool use. Demonstrates solid gains over both tuning-free and tuning-based baselines on standard benchmarks.
×