← Back to Paper List

Semantic Context for Tool Orchestration

Robert Müller
Aganthos
arXiv (2025)
Agent RL RAG Reasoning

📝 Paper Summary

Multi-call tool use with flexible plan RL-based Agentic RAG pipeline
Providing agents with semantic descriptions of tools (Semantic Context) rather than opaque indices enables faster learning, better generalization, and robust adaptation to changing toolsets.
Core Problem
Naive tool orchestration treats tools as abstract indices in a large discrete action space, leading to inefficient learning and catastrophic forgetting when the toolset changes.
Why it matters:
  • Modern agents face dynamic environments where APIs are frequently added or removed, causing index-based policies to fail
  • Standard reinforcement learning approaches scale poorly with large vocabulary sizes (action spaces), requiring impractical amounts of interaction data
  • Treating actions as opaque IDs discards valuable prior knowledge contained in API documentation and docstrings
Concrete Example: When a 'Data Analyzer' tool is removed and replaced by a semantically similar 'Stats Calculator', an index-based agent must relearn the new tool's utility from scratch. A semantic agent sees the similar description and immediately generalizes its previous experience.
Key Novelty
Semantic Context (SC) for Action Representation
  • Represent agent actions (tools) not as one-hot vectors, but as dense embeddings derived from their natural language descriptions (Semantic Context)
  • Use a shared linear reward model over these semantic features, allowing the agent to predict the utility of unseen or new tools based on their similarity to known ones
  • Implement a 'Filter-Reason-Act' (FiReAct) pipeline that uses semantic similarity to retrieve a small candidate set of tools before reasoning, scaling to thousands of actions
Architecture
Architecture Figure Algorithm 1
Pseudocode for the FiReAct (Filter-Reason-Act) pipeline.
Evaluation Highlights
  • SC-LinUCB maintains near-optimal low regret (~100) while index-based LinUCB suffers orders-of-magnitude higher regret (>1000) in static settings
  • In dynamic environments with adding/removing tools, SC-LinUCB shows zero performance drop, whereas baselines suffer catastrophic forgetting and massive regret spikes
  • FiReAct pipeline with semantic context achieves ~90% accuracy on a 10,000+ tool benchmark, compared to ~75% for retrieval alone
Breakthrough Assessment
8/10
Provides a rigorous theoretical and empirical foundation for a widely used but under-analyzed practice (using tool descriptions). The connection between contextual bandits and LLM in-context learning is particularly insightful.
×