| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| SFT on DESIGNER data significantly improves performance over base models and other synthetic baselines. | ||||
| MMLU-Pro | Accuracy | 44.6 | 48.2 | +3.6 |
| GPQA-Diamond | Accuracy | 29.8 | 32.8 | +3.0 |
| MMLU-Pro | Accuracy | 42.1 | 48.2 | +6.1 |
| SciBench | Accuracy | 6.87 | 10.05 | +3.18 |
| Ablation studies confirm the value of both Book and Web sources and the Design Logic method. | ||||
| Average (MMLU-Pro, GPQA, etc.) | Score | 44.6 | 46.2 | +1.6 |