| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| General performance comparisons on the Classic Recommendation task showing agentic methods outperforming traditional baselines. | ||||
| Yelp | HR@5 | 0.012 | 0.051 | +0.039 |
| Cold-start scenarios where agentic systems show robust generalization compared to traditional methods that fail with sparse data. | ||||
| Yelp | HR@5 | 0.005 | 0.045 | +0.040 |