Evaluation Setup
Interactive recommendation simulation using public datasets.
Benchmarks:
- Steam (Game Recommendation)
- MovieLens (Movie Recommendation)
- Amazon Beauty (Product Recommendation)
Metrics:
- Not reported in the provided text (Likely HR, NDCG, and conversational metrics based on context)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| RecLlama Dataset |
Total Samples |
0 |
16183 |
+16183
|
Main Takeaways
- The proposed InteRecAgent framework successfully decouples reasoning (LLM) from domain knowledge (Tools) using a Candidate Bus.
- RecLlama (7B) is proposed as a cost-effective alternative to GPT-4, trained on 16k imitation samples.
- The Plan-First strategy is designed to minimize inference costs compared to Step-by-Step (ReAct) approaches.
- Note: Quantitative performance metrics (Hit Ratio, NDCG) are mentioned in the text as 'satisfying' but the specific tables were not included in the provided excerpt.