| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Evaluation by visually impaired participants (N=4) comparing MM-VID against human-crafted descriptions. | ||||
| Audio Description User Study | Effectiveness of Delivery (0-10) | 8.33 | 7.14 | -1.19 |
| Audio Description User Study | Informativeness (0-10) | 9.29 | 7.14 | -2.15 |
| Audio Description User Study | Audio Quality (0-10) | 9.07 | 8.91 | -0.16 |
| Evaluation by sighted participants (N=5) assessing accuracy and synchronization. | ||||
| Audio Description User Study | Clarity/Accuracy (0-10) | 8.90 | 7.83 | -1.07 |
| Audio Description User Study | Timing and Synchronization (0-10) | 8.59 | 8.53 | -0.06 |