Evaluation Setup
Sequential Recommendation on Amazon datasets
Benchmarks:
- Amazon Beauty (Sequential Recommendation)
- Amazon Sports (Sequential Recommendation)
- Amazon Toys (Sequential Recommendation)
Metrics:
- Hit@10
- NDCG@10
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Performance comparison in ID-based setting showing SLIM enhances traditional backbones. |
| Amazon Sports |
NDCG@10 |
0.0246 |
0.0267 |
+0.0021
|
| Amazon Beauty |
NDCG@10 |
0.0435 |
0.0461 |
+0.0026
|
| Performance in ID-agnostic (transductive/inductive) settings, demonstrating strong generalization. |
| Amazon Toys |
NDCG@10 |
0.0384 |
0.0441 |
+0.0057
|
Main Takeaways
- SLIM consistently improves performance over baselines in both ID-based and ID-agnostic settings, validating the utility of distilled rationales.
- The generated rationales provide effective open-world knowledge that complements the collaborative signals in traditional datasets.
- Student model (LLaMA2-7B) successfully learns the step-by-step reasoning pattern of the teacher, generating high-quality text rationales.
- ID-agnostic performance is particularly boosted, suggesting the natural language rationales help bridge the gap when specific item IDs are less informative or absent.