| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparison against Single-LLM baselines on ToolBench (In-domain) shows α-UMi's superiority across planning and execution metrics. | ||||
| ToolBench (In-domain) | Plan ACC | 81.92 | 88.92 | +7.00 |
| ToolBench (In-domain) | Act. EM | 53.26 | 58.94 | +5.68 |
| ToolBench (In-domain) | Hallu. | 2.32 | 0.57 | -1.75 |
| Real-time evaluation results on ToolBench demonstrating execution success rates. | ||||
| ToolBench | Pass Rate | 60.7 | 70.9 | +10.2 |
| ToolBench | Pass Rate | 40.2 | 70.9 | +30.7 |
| Ablation study showing the necessity of the Global-to-Local (reuse) strategy. | ||||
| ToolBench (In-domain) | Act. EM | 45.11 | 58.94 | +13.83 |