Advancing and Benchmarking Personalized Tool Invocation for LLMs

📝 Paper Summary

Tool profiling User-profile based personalization

This paper introduces Personalized Tool Invocation, a task requiring LLMs to select tools and infer parameters based on user profiles, and proposes PTool, a framework for synthesizing the associated training data.

Core Problem

Existing tool learning focuses on syntax and explicit instructions, ignoring user-specific constraints (preferences for specific platforms) and implicit contexts (missing parameters like addresses) essential for real-world personalization.

Why it matters:

Real-world users often omit crucial details (e.g., delivery address) expecting the system to know them from context
Users have distinct preferences (e.g., speed vs. price) that dictate which tool to use among functionally similar options
Current LLMs lack updated knowledge and personalization capabilities, leading to generic or failed tool calls when user intent is implicit

Concrete Example: A user requests 'Order me a hamburger from KFC' without specifying the address. A standard model fails or hallucinates parameters, whereas a personalized model infers the 'work location' and 'current time' from the user profile to complete the API call.

Key Novelty

PTool (Personalized Tool Data Synthesis Framework)

Defines two sub-tasks: 'Tool Preference' (selecting between similar tools based on user traits) and 'Profile-dependent Query' (inferring missing API parameters from user profiles)
Constructs a hierarchical 'API Tree' to generate functionally similar yet distinct tools (e.g., YouTube vs. TikTok) to test preference capabilities
Uses a bottom-up clustering approach to build feature trees and a top-down assignment strategy to generate diverse, realistic user profiles without redundancy

Architecture

The PTool data synthesis framework workflow.

Evaluation Highlights

Constructed PTBench (PersonalizedToolBench), the first benchmark for this task, containing 1,083 high-quality annotated data samples
Developed a multi-agent synthesis pipeline (User Agent + Assistant Agent) to generate profile-dependent queries that intentionally omit information available in the user profile

Breakthrough Assessment

7/10

First formal definition and benchmark for personalized tool invocation, addressing a critical gap in agentic AI. Score is limited by the synthetic nature of the data and lack of reported performance gains in the provided text snippet.

⚙️ Technical Details

Problem Definition

Setting: Personalized Tool Invocation

Inputs: User query q, Candidate tools T, User Profile P_u

Outputs: Tool call solution A containing selected tool t^i and parameters a^i_m, where some parameters are derived from P_u

Pipeline Flow

Tool Generation (API Tree Construction)
User Profile Construction (Feature Tree + Assignment)
Behavior Simulation (Role-Playing)
Query Generation (Multi-Agent Collaboration)

System Modules

Tool Generator (Data Synthesis)

Generate diverse tool APIs by expanding scenarios into platforms and then specific functions

Model or implementation: Advanced LLM (Specific model not named in text)

Profile Constructor (Data Synthesis)

Create diverse user profiles containing basic attributes and implicit preferences

Model or implementation: Advanced LLM (Specific model not named in text)

User Agent (Data Synthesis)

Simulate user queries based on profile, explicitly omitting details available in the profile

Model or implementation: Advanced LLM (role-playing)

Assistant Agent (Data Synthesis)

Generate the ground truth tool invocation solution

Model or implementation: Advanced LLM

Novel Architectural Elements

Hierarchical feature assignment strategy (Top-down) to generate mathematically diverse user profiles without redundancy
Multi-agent loop specifically designed to generate 'incomplete' queries that mandate profile access for resolution

Modeling

Base Model: Various open-source models (Specific names not listed in text snippet)

Training Method: Fine-tuning

Training Data:

1,083 annotated samples in PTBench
Data synthesized via PTool framework: Tool Gen -> Profile Const -> Behavior Sim -> Query Gen

Compute: Not reported in the paper

Comparison to Prior Work

vs. ToolACE: PTool adds a 'platform' layer to the API tree to create functionally similar tools for preference testing
vs. Toolformer/ToolkenGPT: Prior work focuses on fundamental invocation (syntax/selection); this work adds user profile integration and preference reasoning

Limitations

Relies on synthetic data generation, which may not perfectly reflect real-world user unpredictability
Requires user profiles to be explicitly structured or inferred, which raises potential privacy or availability issues in real applications
The definition of 'implicit preferences' relies on LLM simulation rather than real user behavioral logs

Reproducibility

Code: https://github.com/hyfshadow/PTBench

Code and benchmark are publicly available at https://github.com/hyfshadow/PTBench. The paper describes the synthesis process in detail, but exact prompts and model weights are not explicitly linked in the text provided.

📊 Experiments & Results

Evaluation Setup

Personalized tool invocation where models must select tools and fill parameters based on user profiles

Benchmarks:

PTBench (Personalized Tool Invocation) [New]

Metrics:

Tool Preference Accuracy (implied)
Parameter Extraction Accuracy (implied)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

The paper establishes a new paradigm where tool invocation is conditional on user profiles, moving beyond context-free API calling.
The PTool framework successfully synthesizes 1,083 samples covering 'Tool Preference' and 'Profile-dependent Query' scenarios.
A hierarchical top-down assignment strategy allows for generating a combinatorially large number of diverse user profiles efficiently.

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM tool use (API calling)
Basic concepts of user profiling and personalization
Knowledge of synthetic data generation using LLMs

Key Terms

Personalized Tool Invocation: The task of selecting tools and extracting parameters by leveraging both the user query and user-specific profile information

Tool Preference: A sub-task where the model must choose between functionally similar tools (e.g., Amazon vs. Walmart) based on user traits (e.g., price sensitivity)

Profile-dependent Query: A query that omits necessary API parameters (e.g., address, phone number), requiring the model to infer them from the user's profile

PTool: The authors' proposed automated data synthesis framework for generating personalized tool invocation datasets

PTBench: PersonalizedToolBench, the benchmark dataset constructed using the PTool framework

API Tree: A hierarchical structure used in data generation, expanding from scenarios to platforms to specific API functions to ensure tool diversity

Implicit Preferences: User traits (e.g., price sensitivity) that are not explicitly stated in a profile but are inferred from historical behavior and used to guide tool selection