| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance on Android-in-the-Wild (AitW) showing significant gains in both static action matching and dynamic task completion. | ||||
| AitW (Static) | Action Accuracy | 73.2 | 78.2 | +5.0 |
| AitW (Dynamic) | Success Rate | 39.6 | 52.8 | +13.2 |
| AitW (Dynamic) | Success Rate | 65.3 | 71.6 | +6.3 |
| Generalization results on other benchmarks (GUI Odyssey and Mind2Web) demonstrate robustness across different domains. | ||||
| GUI Odyssey | Action Accuracy | 78.9 | 82.1 | +3.2 |
| Mind2Web | Action Accuracy | 75.4 | 77.5 | +2.1 |