| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on MVBench showing VideoChat2's dominance over existing open-source MLLMs and GPT-4V. | ||||
| MVBench | Average Accuracy | 35.5 | 51.1 | +15.6 |
| MVBench | Average Accuracy | 43.5 | 51.1 | +7.6 |
| MVBench | Average Accuracy | 32.7 | 51.1 | +18.4 |
| Zero-shot QA performance on standard video benchmarks confirms generalization capability. | ||||
| ActivityNet-QA | Accuracy | 35.2 | 49.1 | +13.9 |
| MSRVTT-QA | Accuracy | 49.3 | 54.1 | +4.8 |