| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Machine Learning Efficacy (MLE) results where models are trained purely on synthetic data and evaluated on real test data. SampleLLM generally outperforms baselines. | ||||
| MIND | AUC | 0.5824 | 0.5891 | +0.0067 |
| Alibaba | AUC | 0.5731 | 0.5802 | +0.0071 |
| ML-100k | AUC | 0.6401 | 0.6511 | +0.0110 |
| Data Augmentation results where synthetic data is added to the real training set. | ||||
| MIND | AUC | 0.6135 | 0.6272 | +0.0137 |
| Alibaba | AUC | 0.5815 | 0.5983 | +0.0168 |