Evaluation Setup
Sequential recommendation simulation. The last item in a user's history is the test target. Agents simulate interactions with previous items to build memory, then rank candidates.
Benchmarks:
- Amazon-Book (Sequential Recommendation)
- Yelp (Sequential Recommendation)
- ML-1M (Movie Recommendation)
Metrics:
- NDCG@1
- NDCG@5
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| KGLA consistently outperforms baselines across all datasets on NDCG@1. |
| Amazon-Book |
NDCG@1 |
0.0515 |
0.1006 |
+0.0491
|
| Yelp |
NDCG@1 |
0.0461 |
0.0649 |
+0.0188
|
| ML-1M |
NDCG@1 |
0.1480 |
0.1972 |
+0.0492
|
| KGLA also shows significant gains in NDCG@5 compared to AgentCF. |
| Amazon-Book |
NDCG@5 |
0.0827 |
0.1585 |
+0.0758
|
Main Takeaways
- Incorporating KG paths as textual rationales drastically improves the accuracy of simulated user profiles compared to purely interaction-based simulation.
- The method is effective across diverse domains (Books, Business, Movies), showing robustness.
- The approach bridges the gap between deep learning-based collaborative filtering and explicit, explainable LLM-based profiling.
- 2-hop paths provide direct relational context, while 3-hop paths offer broader descriptive features that help distinguish preferences.