← Back to Paper List

AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

Zhiqiang Xie, Hao Kang, Ying Sheng, Tushar Krishna, Kayvon Fatahalian, Christos Kozyrakis
arXiv.org (2024)
Agent Memory Benchmark

📝 Paper Summary

Multi-agent simulation LLM inference systems
AI Metropolis accelerates LLM-based multi-agent simulations by replacing global time-step synchronization with an out-of-order execution scheduler that tracks spatial-temporal dependencies to maximize parallel inference.
Core Problem
Traditional multi-agent simulations enforce global synchronization where all agents must finish a time step before any proceed, causing massive idle time due to variance in LLM response lengths and sparse agent activity.
Why it matters:
  • Inference takes ~95% of simulation time; synchronization bottlenecks prevent batching, leading to low GPU utilization and high costs
  • Current approaches borrowed from Reinforcement Learning (global `step()` functions) artificially limit parallelism by enforcing false dependencies between agents who are not interacting
  • Scalability is poor; adding more compute resources fails to decrease simulation time because the critical path is dominated by the slowest agent in each step
Concrete Example: In a 25-agent village, Agent A is isolated in a house while Agent B converses with Agent C. In standard simulations, Agent A must wait for B and C to finish their conversation (multiple LLM calls) before A can take their next step, even though A's actions cannot affect B or C.
Key Novelty
Out-of-order Agent Scheduling via Spatiotemporal Dependency Graph
  • Treats simulation steps like instruction scheduling in a CPU: allows agents to process future time steps ahead of others if they are spatially distant (no read-after-write conflicts)
  • Introduces 'Coupled' clusters: dynamically groups agents that interact into small synchronization units, while letting non-interacting agents proceed asynchronously
  • Implements a 'Dependency Graph' that calculates safe execution windows based on agent distance and maximum velocity, removing false global dependencies
Architecture
Architecture Figure Algorithm 3 / System Description
The workflow of AI Metropolis contrasting with standard loops. Shows the interaction between Controller, Ready Queue, Ack Queue, and Workers.
Evaluation Highlights
  • Achieves 1.3x to 4.15x speedup over standard parallel simulation with global synchronization
  • Reduces the average number of dependencies per agent from 25 (global sync) to 1.85, effectively removing most false dependencies
  • Performance approaches the theoretical optimal (unconstrained execution) as the number of agents increases, demonstrating high scalability
Breakthrough Assessment
7/10
Significant systems-level optimization for agent simulations. While it doesn't improve agent intelligence, it solves a critical bottleneck (speed/cost) that hinders large-scale agent research.
×