| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ALFWorld Results: AdaPlanner outperforms baselines using significantly fewer samples. | ||||
| ALFWorld | Success Rate | 88.06 | 91.79 | +3.73 |
| ALFWorld | Success Rate | 61.94 | 91.79 | +29.85 |
| ALFWorld | Success Rate | 37.00 | 91.79 | +54.79 |
| MiniWoB++ Results: High performance with extreme sample efficiency. | ||||
| MiniWoB++ (With feedback) | Success Rate | 81.56 | 91.11 | +9.55 |
| MiniWoB++ (With feedback) | Success Rate | 38.50 | 91.11 | +52.61 |
| ALFWorld | Success Rate | 46.00 | 81.00 | +35.00 |
| ALFWorld | Success Rate | 19.00 | 38.00 | +19.00 |