| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance comparison on sequential recommendation datasets (NDCG@1). RecCocktail consistently outperforms both breadth-oriented (P5) and depth-oriented (TALLRec) baselines. | ||||
| Beauty | NDCG@1 | 0.3347 | 0.4132 | +0.0785 |
| Toys | NDCG@1 | 0.3746 | 0.4097 | +0.0351 |
| Sports | NDCG@1 | 0.3585 | 0.3754 | +0.0169 |
| MovieLens-1M | NDCG@1 | 0.5392 | 0.5783 | +0.0391 |
| Generalization capability testing using NDCG@3. The general module (RecCocktail-G) shows strong zero-shot performance compared to raw LLMs. | ||||
| Beauty | NDCG@3 | 0.0260 | 0.2072 | +0.1812 |