| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Chart-R1 achieves state-of-the-art performance on the standard ChartQA benchmark, outperforming both open-source and proprietary models. | ||||
| ChartQA | Accuracy | 80.3 | 83.9 | +3.6 |
| ChartQA | Accuracy | 81.6 | 83.9 | +2.3 |
| On the newly proposed ChartRQA benchmark, which requires complex multi-step reasoning, Chart-R1 shows massive improvements over existing methods. | ||||
| ChartRQA-Single | Accuracy | 46.1 | 78.4 | +32.3 |
| ChartRQA-Multi | Accuracy | 20.3 | 53.6 | +33.3 |
| ChartRQA-Multi | Accuracy | 53.3 | 53.6 | +0.3 |
| Ablation studies confirm the necessity of the two-stage training process. | ||||
| ChartRQA-Single | Accuracy | 73.4 | 78.4 | +5.0 |
| ChartQA | Accuracy | 81.6 | 83.9 | +2.3 |