| Benchmark | Metric | Baseline | This Paper | ฮ |
|---|---|---|---|---|
| Main comparison against state-of-the-art baselines on NExT-QA showing significant improvements over modular and some end-to-end methods. | ||||
| NExT-QA | Accuracy | 60.0 | 83.6 | +23.6 |
| NExT-QA | Accuracy | 63.6 | 83.6 | +20.0 |
| iVQA | Accuracy | 53.8 | 76.9 | +23.1 |
| ActivityNet-QA | Accuracy | 35.2 | 52.7 | +17.5 |
| Ablation study of Scene Graph integration strategies compared to VLM-only baseline. | ||||
| NExT-QA | Accuracy | 78.4 | 77.5 | -0.9 |
| iVQA | Accuracy | 72.0 | 75.7 | +3.7 |