TTE: Test-Time Tool Evolution—a paradigm where agents create and refine tools during the inference phase rather than relying on pre-defined libraries
TTE-Zero: A specific TTE setting where the agent starts with an empty tool library (L_0 = ∅) and evolves it from scratch
TTE-Adapt: A specific TTE setting where the agent adapts a pre-existing source library to a new target domain
SciEvo: A benchmark dataset introduced in this paper comprising 1,590 scientific reasoning tasks and 925 evolved tools
Atomic Tool Refinement: The process of breaking down complex generated code into minimal, reusable functional units ('cell tools')
TRR: Tool Reuse Rate—a metric measuring the proportion of generated tools that are successfully reused in subsequent tasks
Tabula Rasa: Latin for 'blank slate'—refers to the TTE-Zero setting where the agent starts with no prior tools
PCA: Principal Component Analysis—a dimensionality reduction technique used here to cluster tool embeddings for taxonomy construction
SOTA: State-of-the-Art—the current best performance achieved by existing methods