| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance comparison on MovieLens-1M using T5-Base backbone shows GFlowGR significantly outperforming SFT and RL baselines. | ||||
| MovieLens-1M | NDCG@5 | 0.1340 | 0.1701 | +0.0361 |
| MovieLens-1M | NDCG@5 | 0.1398 | 0.1701 | +0.0303 |
| Performance on Amazon Beauty with Llama-130M backbone confirms generalization across model architectures. | ||||
| Amazon Beauty | Recall@5 | 0.0614 | 0.0762 | +0.0148 |
| Online A/B testing results from Taobao deployment. | ||||
| Taobao Production System | Revenue (RPM) | 0 | 1.0 | +1% |