| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ExeVRM 8B outperforms both proprietary and open-weight baselines on the ExeVR-Bench outcome assessment task. | ||||
| ExeVR-Bench | Accuracy | 80.3 | 84.7 | +4.4 |
| ExeVR-Bench | Accuracy | 75.0 | 84.7 | +9.7 |
| ExeVR-Bench | Recall | 74.7 | 87.7 | +13.0 |
| ExeVR-Bench | Recall | 66.5 | 87.7 | +21.2 |