| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| PDC data significantly improves performance over base models, and DM-SFT further improves over standard SFT. | ||||
| Internal Evaluation Dataset | Accuracy | 28.5 | 43.8 | +15.3 |
| Internal Evaluation Dataset | Accuracy | 43.8 | 49.8 | +6.0 |
| Internal Evaluation Dataset | Accuracy | 42.6 | 49.3 | +6.7 |
| Internal Evaluation Dataset | Accuracy | 43.9 | 49.7 | +5.8 |