| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| AgriGPT-VL outperforms general-purpose flagship models on the specialized AgriBench-VL-4K benchmark. | ||||
| AgriBench-VL-4K | Pairwise Win Rate (LLM-judge) | Not reported in the paper | Not reported in the paper | - |
| Ablation studies show the necessity of the multi-agent refinement pipeline. | ||||
| Internal Validation | Filter Rate (Correctness/Grounding) | 100.0 | 92.0 | -8.0 |