| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| LAT consistently improves over the vanilla model and SFT baseline across single-image and multi-image settings. | ||||
| Average (Paper-VISA & Wiki-VISA) | Soft EM improvement | Not reported as absolute number (percentage gain only) | Not reported as absolute number (percentage gain only) | +8.23% |
| Average (Paper-VISA & Wiki-VISA) | IoU@0.5 improvement | Not reported as absolute number (percentage gain only) | Not reported as absolute number (percentage gain only) | +47.0% |