Evaluation Setup
Evaluated on the U-NEED dataset across 5 product categories (Beauty, Phones, Fashion, Shoes, Electronics).
Benchmarks:
- U-NEED Dataset (E-commerce pre-sales dialogue)
Metrics:
- Precision
- Recall
- F1 score
- Hit@K
- MRR@K
- Distinct-n
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Quantitative results were not included in the provided text (source text ends at Section 4.1), so specific performance deltas cannot be reported.
- The authors qualitatively claim that collaboration between LLM and CRS is effective for dialogue understanding, user needs elicitation, and recommendation.
- The authors note that LLM and CRS strengths are complementary: LLMs provide semantic understanding while CRSs provide grounding in candidate items.