| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on GAIA validation set show Alita outperforming state-of-the-art generalist agents. | ||||
| GAIA | Pass@1 | 67.36 | 75.15 | +7.79 |
| GAIA | Pass@3 | Not reported in the paper | 87.27 | Not reported in the paper |
| MathVista | Pass@1 | 68 | 74 | +6 |
| PathVQA | Pass@1 | 47 | 52 | +5 |
| Ablation on reuse of generated MCPs shows that tools created by stronger models improve weaker models. | ||||
| GAIA Level 3 | Accuracy | 3.85 | 11.54 | +7.69 |