← Back to Paper List

AI Agents for Inventory Control: Human-LLM-OR Complementarity

Jackie Baek, Yaopeng Fu, Will Ma, Tianyi Peng
arXiv (2026)
Agent Benchmark Reasoning Memory

📝 Paper Summary

Agentic AI Human-AI Collaboration Operations Research (OR) integration
Combines traditional Operations Research heuristics, LLM reasoning, and human oversight into a hybrid inventory management pipeline, demonstrating that these components are complementary rather than substitutes.
Core Problem
Traditional inventory algorithms are brittle to demand shifts and lack context, while LLMs lack mathematical precision for stock calculations, and human decision-makers are inconsistent.
Why it matters:
  • Inventory control is fundamental to supply chains but struggles with non-stationary demand (trends, shocks) and unobservable contexts (news, seasonality)
  • Purely algorithmic solutions fail when historical data doesn't reflect the current environment
  • Human-AI teams often fail to outperform the better of the two acting alone; proving genuine complementarity in high-stakes operations is an open challenge
Concrete Example: An OR algorithm seeing a demand spike might assume it's noise and understock, while an LLM reading a product description knows 'swimwear' has seasonal demand. Conversely, an LLM might hallucinate the arithmetic of pipeline inventory, which the OR algorithm calculates perfectly.
Key Novelty
OR-Augmented LLM Agents with Human-in-the-Loop
  • Uses a standard OR heuristic (capped base-stock policy) to generate a mathematically grounded 'recommendation' that the LLM can adopt or override based on textual context
  • Implements a 'carry-over insight' memory mechanism where the LLM writes concise memos about structural changes (e.g., 'lead time is actually 3 weeks') to pass to future steps
  • Formalizes 'individual-level complementarity' to prove that humans add value to the AI pipeline, rather than just selecting when to use it
Architecture
Architecture Figure Figure 3
The OR->LLM agent architecture showing how inputs are processed and decisions made in each period.
Evaluation Highlights
  • OR→LLM agent (Gemini 3 Flash) achieves 0.538 normalized profit, a 21% improvement over the OR heuristic alone on InventoryBench
  • Human-in-the-loop (Mode B: OR→LLM→Human) significantly outperforms fully automated agents (OR→LLM) and Human-only baselines
  • Theoretical analysis estimates that at least 20.3% of individual participants experience strictly positive complementarity (performing better with AI than either they or the AI could alone)
Breakthrough Assessment
8/10
Strong empirical evidence of Human-AI complementarity in a complex domain, backed by a new benchmark (InventoryBench) and a theoretical framework for measuring individual-level gains.
×