ToolE: The dataset introduced in this paper, comprising 21,127 user queries paired with tool descriptions, generated to test tool awareness and selection
Sycophancy: The tendency of an LLM to agree with the user's premise or prompt bias, potentially leading to unnecessary tool usage
Hallucination: In this context, the model selecting a tool that does not exist or inventing tool capabilities that are not present in the description
ReAct: Reasoning and Actingโa prompting paradigm where LLMs generate reasoning traces before executing actions (tools)
Overlapped issue: A scenario where a user query can be validly addressed by multiple distinct tools, complicating single-label evaluation
Direct diverse generation: A prompting strategy instructing the model to produce queries with distinct tones (requests, orders) and levels of detail
Emotional generation: A prompting strategy augmenting instructions with specific emotions (happiness, anger, depression) to generate more human-like queries