| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Overall performance comparison showing the gap between simple task completion (TFS) and efficient execution (TEFS). | ||||
| MCPAgentBench | TFS | 48.1 | 71.6 | +23.5 |
| MCPAgentBench | TEFS | 33.5 | 57.7 | +24.2 |
| MCPAgentBench | TEFS | 57.7 | 39.4 | -18.3 |
| Specific analysis of parallel task performance reveals extreme strategic differences between models. | ||||
| MCPAgentBench (Dual Parallel) | TEFS | 100.0 | 0.0 | -100.0 |
| Efficiency metrics regarding token consumption and time. | ||||
| MCPAgentBench | Token Efficiency | lowest | highest | positive |