| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main performance comparison on WebArena showing ASI superiority over static and text-skill baselines. | ||||
| WebArena | Success Rate | 19.3 | 42.8 | +23.5 |
| WebArena | Success Rate | 31.5 | 42.8 | +11.3 |
| WebArena | Average Steps | 22.8 | 16.2 | -6.6 |
| WebArena | Average Steps | 18.1 | 16.2 | -1.9 |
| Ablation study on the format (Text vs Program) and verification of skills. | ||||
| WebArena (Shopping) | Success Rate | 40.0 | 42.6 | +2.6 |
| WebArena (Shopping) | Success Rate | 33.2 | 37.4 | +4.2 |