Evaluation Setup
Personalized top-K recommendation
Benchmarks:
- MovieLens-1M (Movie Recommendation)
- Recipe (Recipe Recommendation)
Metrics:
- NDCG@10
- Precision@10
- Recall@10
- Statistical methodology: Reported mean and standard deviation across five different splits
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Performance on MovieLens-1M showing LLM-Rec improvements over MLP baseline and complex content methods. |
| MovieLens-1M |
NDCG@10 |
0.3640 |
0.3867 |
+0.0227
|
| MovieLens-1M |
NDCG@10 |
0.3640 |
0.3951 |
+0.0311
|
| Performance on Recipe dataset showing larger gains due to sparse original metadata. |
| Recipe |
NDCG@10 |
0.0580 |
0.0706 |
+0.0126
|
| Recipe |
NDCG@10 |
0.0652 |
0.0706 |
+0.0054
|
| MovieLens-1M |
NDCG@10 |
0.3824 |
0.3951 |
+0.0127
|
Main Takeaways
- Augmented text significantly enhances recommendation quality, especially for datasets with sparse or incomplete descriptions (Recipe).
- LLM-Rec enables simple MLP models to outperform complex feature-interaction models (AutoInt, DCN-V2, EDCN), suggesting input quality is more critical than model complexity.
- Engagement-guided prompting is highly effective, leveraging user behavior to guide the LLM's text generation.
- Open-source models (Llama-2) are competitive with proprietary models (GPT-3) for this augmentation task.