| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ChefMind significantly outperforms ablation baselines in overall quality scores. | ||||
| Xiachufang Custom Test Set | Average Score (1-10) | 6.7 | 8.7 | +2.0 |
| ChefMind demonstrates far superior robustness, successfully handling nearly all queries. | ||||
| Xiachufang Custom Test Set | Unprocessed Queries Rate | 25.6% | 1.6% | -24.0% |
| Xiachufang Custom Test Set | Unprocessed Queries Count | 22 | 2 | -20 |