| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| OneRec-Think consistently outperforms state-of-the-art baselines, including both traditional sequential models and recent generative approaches, across all three Amazon datasets. | ||||
| Amazon Beauty | Recall@5 | 0.0701 | 0.0768 | +0.0067 |
| Amazon Toys | Recall@5 | 0.0725 | 0.0805 | +0.0080 |
| Amazon Sports | Recall@5 | 0.0461 | 0.0525 | +0.0064 |
| Ablation studies confirm that both Itemic Alignment and Reasoning components are essential for performance. | ||||
| Amazon Beauty | Recall@5 | 0.0658 | 0.0768 | +0.0110 |
| Industrial A/B testing on Kuaishou shows significant engagement gains. | ||||
| Kuaishou App | APP Stay Time | 0.000 | 0.159 | +0.159% |