| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on Berkeley Function-Calling Leaderboard (BFCL) showing improvements over Zero-Shot baseline and Tool-Be-Honest. | ||||
| BFCL | Accuracy | 73.33 | 86.67 | +13.34 |
| BFCL | Accuracy | 78.27 | 86.67 | +8.40 |
| BFCL | Accuracy | 87.05 | 90.00 | +2.95 |
| Ablation study on documentation quality (Doc Quality) using StableToolBench, showing robustness to missing information. | ||||
| StableToolBench (Doc-40%) | Pass Rate | 57.8 | 75.0 | +17.2 |