| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| WebShop | Token Distribution (Thought) | 100 | 45.1 | N/A |
| Agent-Omit consistently achieves high accuracy comparable to frontier models while reducing costs. | ||||
| Average across 5 benchmarks | Performance | Not reported in the paper | Not reported in the paper | Not reported in the paper |