Evaluation Setup
Leave-one-out evaluation on 4 Amazon datasets (Beauty, Toys, Tools, Office).
Benchmarks:
- Amazon Beauty (Product Recommendation)
- Amazon Toys-Games (Product Recommendation)
- Amazon Tools-Home (Product Recommendation)
- Amazon Office-Products (Product Recommendation)
Metrics:
- Recall@10
- NDCG@10
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Performance of LLMInit-Var (Variance Selection) applied to LightGCN across four datasets compared to standard random initialization. |
| Amazon Beauty |
Recall@10 |
0.0910 |
0.1019 |
+0.0109
|
| Amazon Toys-Games |
Recall@10 |
0.0775 |
0.0808 |
+0.0033
|
| Amazon Office-Products |
Recall@10 |
0.0745 |
0.0816 |
+0.0071
|
| Performance of LLMInit-Var applied to SGCL (Supervised Graph Contrastive Learning) showing larger gains. |
| Amazon Office-Products |
Recall@10 |
0.0647 |
0.0776 |
+0.0129
|
| Amazon Office-Products |
NDCG@10 |
0.0298 |
0.0366 |
+0.0068
|
| Cold-start analysis: Applying LLMInit to SGCL in a sparse setting (users with single interaction). |
| Amazon Beauty (Cold Start) |
Recall@10 |
0.045 |
0.068 |
+0.023
|
Main Takeaways
- LLMInit consistently improves performance across all datasets and base models (LightGCN, SGL, SGCL), with Variance-based selection (LLMInit-Var) performing best.
- The method is particularly effective for advanced models like SGCL, where supervised loss functions can better exploit the informative initialization.
- Significantly outperforms heavy LLM-based baselines (like LLMRec) in efficiency, requiring orders of magnitude fewer parameters (2M vs 7B) while achieving competitive or better accuracy.
- Larger LLMs (e.g., GPT-L) do not necessarily yield better embeddings for initialization; domain alignment and quality (like MPNet) are more critical than model size.