| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| DeepTravel significantly outperforms both general-purpose reasoning models and RL baselines on travel planning tasks. | ||||
| Travel Planning (General) | Performance Comparison | Not reported in the paper | Not reported in the paper | Not reported in the paper |
| Travel Planning (General) | Performance Comparison | Not reported in the paper | Not reported in the paper | Not reported in the paper |