← Back to Paper List

Creative Robot Tool Use with Large Language Models

Mengdi Xu, Peide Huang, Wenhao Yu, Shiqi Liu, Xilun Zhang, Yaru Niu, Tingnan Zhang, Fei Xia, Jie Tan, Ding Zhao
Carnegie Mellon University, Google DeepMind
arXiv (2023)
Agent Reasoning Benchmark

📝 Paper Summary

Multi-call tool use with flexible plan Multi-task planning Embodied AI / Robotics
RoboTool is a modular LLM-based system that enables robots to solve long-horizon tasks requiring creative tool use by identifying implicit physical constraints and generating executable Python code.
Core Problem
Robots struggle with tasks involving implicit physical constraints (e.g., reaching objects out of workspace, crossing wide gaps) that require creative tool use—improvising with available objects beyond their standard affordances.
Why it matters:
  • Traditional Task and Motion Planning (TAMP) relies on explicit optimization, which is computationally expensive and difficult to scale for complex, long-horizon tasks.
  • Existing LLM robotics methods often assume standard tool usage or static environments, failing when tasks require reasoning about physical properties like material, shape, or gap width to improvise solutions.
  • Creative tool use (using a surfboard as a bridge, or a hammer as a hook) is a hallmark of advanced intelligence lacking in standard robotic control systems.
Concrete Example: A quadrupedal robot needs to 'walk to the other sofa,' but a 0.4m gap exists between sofas, exceeding its 0.1m step limit. A standard planner fails because the gap constraint is implicit. RoboTool analyzes the scene, calculates the gap width, and decides to push a surfboard to bridge the gap.
Key Novelty
Modular LLM-based Creative Tool User (RoboTool)
  • Decomposes the planning process into four specialized LLM agents: Analyzer (identifies constraints), Planner (strategies), Calculator (parameters), and Coder (executable code).
  • Explicitly prompts an LLM to function as a 'Calculator' to derive numerical parameters (e.g., target coordinates for a push) based on object affordances, bridging high-level reasoning with low-level control.
  • Enables three distinct types of creativity: Tool Selection (choosing correct tools), Sequential Tool Use (multi-step plans), and Tool Manufacturing (assembling/modifying objects).
Architecture
Architecture Figure Figure 2
The hierarchical architecture of RoboTool with its four key components.
Evaluation Highlights
  • Achieves 100% success rate on 'Sofa-Traversing' and 'Sofa-Climbing' tasks in simulation, compared to 0-10% for the 'Planner-Coder' baseline.
  • Outperforms the 'Coder' (Code-as-Policies style) baseline by a large margin across all 6 creative tasks, which achieved near 0% success on most tasks.
  • Maintains high performance (0.7-0.9 success rates) in real-world experiments with a quadrupedal robot and robotic arm, despite perception noise.
Breakthrough Assessment
8/10
Significantly advances robotic reasoning by demonstrating zero-shot 'creative' behaviors (tool manufacturing/improvisation) using standard LLMs, solving problems traditional TAMP and direct coding methods fail at.
×