| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ColdLLM significantly outperforms baselines on offline metrics, demonstrating the effectiveness of simulation. | ||||
| Alibaba Dataset | Recall@200 | 0.0521 | 0.0634 | +0.0113 |
| Alibaba Dataset | NDCG@200 | 0.0248 | 0.0298 | +0.0050 |
| CiteULike | Recall@200 | 0.0673 | 0.0784 | +0.0111 |
| Ablation studies confirm the necessity of the LLM component. | ||||
| Alibaba Dataset | Recall@200 | 0.0542 | 0.0634 | +0.0092 |