| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ECPO demonstrates superior interactive capabilities across three datasets compared to SFT and other preference optimization methods. | ||||
| ReDial | Win Rate | 50.0 | 58.4 | +8.4 |
| Multi-WOZ | Win Rate | 50.0 | 64.0 | +14.0 |
| KuaiRec | Win Rate | 50.0 | 61.6 | +11.6 |
| Comparison against advanced preference optimization baselines (using SFT as anchor, implied). | ||||
| ReDial | Win Rate | 54.0 | 58.4 | +4.4 |
| Human evaluation confirms the superiority of the AILO simulator. | ||||
| Human Evaluation | Win Rate (Human-likeness) | 0.0 | 100.0 | +100.0 |