| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance of automated agents on InventoryBench (Gemini 3 Flash). The combination of OR and LLM outperforms either in isolation. | ||||
| InventoryBench | Normalized Profit | 0.445 | 0.538 | +0.093 |
| InventoryBench | Normalized Profit | 0.334 | 0.538 | +0.204 |
| Human-in-the-loop experiment results (Real-data instances). Human collaboration adds value beyond the best automated agent. | ||||
| Real-data instances | Normalized Profit | 0.534 | 0.584 | +0.050 |
| Real-data instances | Normalized Profit | 0.540 | 0.584 | +0.044 |
| Real-data instances | Estimated Fraction of Positive Complementarity | 0 | 0.203 | +0.203 |