| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ToolE (single-tool) | nDCG@5 | 0.6522 | 0.7821 | +0.1299 |
| ToolE (multi-tool) | nDCG@5 | 0.5296 | 0.7231 | +0.1935 |
| ToolBench (I1 category) | nDCG@5 | 0.5962 | 0.6110 | +0.0148 |
| End-to-end evaluation showing Re-Invoke helps agents complete tasks better than supervised retrievers. | ||||
| ToolBench (Average) | Pass Rate | 52.63 | 56.07 | +3.44 |