Tool2Vec: A method of representing tools by averaging the embeddings of user queries that successfully use those tools, rather than embedding the tool's text description
MLC: Multi-Label Classification—framing tool retrieval as predicting a binary vector where 1 indicates a tool is relevant and 0 indicates it is not
Recall@K: A metric measuring the proportion of relevant items found in the top-K retrieved results
ToolRefiner: A second-stage model that takes the query and the embeddings of tools retrieved in the first stage to perform a more accurate binary classification of relevance
ToolBank: A new domain-specific tool retrieval dataset created by the authors using LLMs to generate natural queries and enforce tool co-occurrence
DeBERTa: Decoding-enhanced BERT with disentangled attention—a transformer model used here as the backbone for the classification and refinement tasks
dense retrieval: Finding relevant items by comparing vector representations (embeddings) of queries and items, typically using cosine similarity
nDCG: Normalized Discounted Cumulative Gain—a measure of ranking quality that considers the position of relevant items in the list