| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Benchmark results on the Reddit-Amazon-EM dataset showing the superiority of graph-based and LLM-based methods over traditional baselines. | ||||
| Reddit-Amazon-EM | F1 | 78.43 | 96.29 | +17.86 |
| Reddit-Amazon-EM | F1 | 72.29 | 96.29 | +24.00 |
| Reddit-Amazon-EM | F1 | 86.68 | 96.29 | +9.61 |
| Reddit-Amazon-EM | F1 | 94.02 | 96.29 | +2.27 |
| Downstream CRS evaluation measuring how well different EM methods retrieve ground-truth movies mentioned in LLM-generated recommendations. | ||||
| LLM-based CRS (GPT-3.5 responses) | Recall@5 | 7.22 | 7.84 | +0.62 |