← Back to Paper List

Advancing and Benchmarking Personalized Tool Invocation for LLMs

Xu Huang, Yuefeng Huang, Weiwen Liu, Xingshan Zeng, Yasheng Wang, Ruiming Tang, Hong Xie, Defu Lian
University of Science and Technology of China, Shanghai Jiao Tong University, Huawei Noah’s Ark Lab
arXiv (2025)
P13N Agent Benchmark

📝 Paper Summary

Tool profiling User-profile based personalization
This paper introduces Personalized Tool Invocation, a task requiring LLMs to select tools and infer parameters based on user profiles, and proposes PTool, a framework for synthesizing the associated training data.
Core Problem
Existing tool learning focuses on syntax and explicit instructions, ignoring user-specific constraints (preferences for specific platforms) and implicit contexts (missing parameters like addresses) essential for real-world personalization.
Why it matters:
  • Real-world users often omit crucial details (e.g., delivery address) expecting the system to know them from context
  • Users have distinct preferences (e.g., speed vs. price) that dictate which tool to use among functionally similar options
  • Current LLMs lack updated knowledge and personalization capabilities, leading to generic or failed tool calls when user intent is implicit
Concrete Example: A user requests 'Order me a hamburger from KFC' without specifying the address. A standard model fails or hallucinates parameters, whereas a personalized model infers the 'work location' and 'current time' from the user profile to complete the API call.
Key Novelty
PTool (Personalized Tool Data Synthesis Framework)
  • Defines two sub-tasks: 'Tool Preference' (selecting between similar tools based on user traits) and 'Profile-dependent Query' (inferring missing API parameters from user profiles)
  • Constructs a hierarchical 'API Tree' to generate functionally similar yet distinct tools (e.g., YouTube vs. TikTok) to test preference capabilities
  • Uses a bottom-up clustering approach to build feature trees and a top-down assignment strategy to generate diverse, realistic user profiles without redundancy
Architecture
Architecture Figure Figure 2
The PTool data synthesis framework workflow.
Evaluation Highlights
  • Constructed PTBench (PersonalizedToolBench), the first benchmark for this task, containing 1,083 high-quality annotated data samples
  • Developed a multi-agent synthesis pipeline (User Agent + Assistant Agent) to generate profile-dependent queries that intentionally omit information available in the user profile
Breakthrough Assessment
7/10
First formal definition and benchmark for personalized tool invocation, addressing a critical gap in agentic AI. Score is limited by the synthetic nature of the data and lack of reported performance gains in the provided text snippet.
×