| Benchmark | Metric | Baseline | This Paper | Ξ |
|---|---|---|---|---|
| Agent-E significantly outperforms state-of-the-art baselines on the WebVoyager benchmark. | ||||
| WebVoyager | Success Rate | 52.7 | 73.2 | +20.5 |
| WebVoyager | Success Rate | 57.2 | 73.2 | +16.0 |
| WebVoyager (WolframAlpha) | Success Rate | 65.2 | 95.7 | +30.5 |
| WebVoyager | Average LLM Calls per Task | Not reported in the paper | 25 | Not reported in the paper |