| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| tau2-Bench | Avg Accuracy | 18.59 | 26.76 | +8.17 |
| BFCL-V4 (Multi-Turn) | Overall Accuracy | 26.75 | 36.50 | +9.75 |
| BFCL-V4 (Web Search) | Overall Accuracy | 13.50 | 27.50 | +14.00 |
| ToolSandbox | Overall Score | 56.19 | 68.20 | +12.01 |
| tau2-Bench (In-domain) | Avg Accuracy | 32.81 | 41.39 | +8.58 |