Evaluation Setup
Conversational recommendation in the restaurant domain using the MGConvRex dataset
Benchmarks:
- MGConvRex (Conversational Recommendation (Dialog Policy & Item Ranking)) [New]
Metrics:
- Act Accuracy
- Act F1
- Entity Matching Rate (EMR @1, @3, @5)
- Item Matching Rate (IMR)
- Success Rate (Online Simulation)
- Statistical methodology: Results averaged over 3 runs for online simulation. No significance tests reported for offline metrics.
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| MGConvRex |
Act Accuracy |
59.46 |
65.70 |
+6.24
|
| MGConvRex |
Item Matching Rate (IMR) |
29.02 |
48.47 |
+19.45
|
| MGConvRex (Simulator) |
Success Rate |
39.16 |
67.93 |
+28.77
|
| MGConvRex |
Success Rate |
27.50 |
67.93 |
+40.43
|
| MGConvRex |
Act Accuracy |
42.37 |
65.70 |
+23.33
|
Main Takeaways
- Dynamic graph updates are crucial; using a static graph (Static G) severely degrades success rate, proving the value of online memory accumulation.
- Graph-based reasoning (UMGR) significantly outperforms Memory Networks and Pretrained Embeddings, likely due to better handling of structure and open policy spaces.
- The model successfully generalizes to unseen users (zero-shot) because it relies on graph relations rather than user-specific IDs.
- The system is effective at item recommendation but tends to be aggressive in making recommendations compared to human diversity.