← Back to Paper List

Chaotic Dynamics in Multi-LLM Deliberation

Hajime Shimao, Warut Khern-am-nuai, Sung Joo Kim
The Pennsylvania State University, McGill University, American University
arXiv (2026)
Agent Memory Benchmark

📝 Paper Summary

Multi-agent Collective AI systems
Multi-LLM committees exhibit chaotic instability where identical inputs produce diverging outcomes, driven by role differentiation and model heterogeneity, but this can be mitigated by Chair ablation or memory reduction.
Core Problem
Multi-LLM committees used for governance are often assumed to be deterministic at temperature T=0, but they exhibit structural instability where nominally identical runs diverge into different decisions.
Why it matters:
  • Reproducibility is a governance property; if identical runs yield different policies, institutions face critical uncertainty.
  • Current evaluations rely on one-shot metrics that miss trajectory sensitivity, leading to a false sense of security.
  • Unpredictability limits controllability and explainability in high-stakes collective AI decision-making.
Concrete Example: In the HL-01 benchmark scenario, five agents debating health policy diverge into different collective mean preferences across runs even at T=0, with divergence growing exponentially over deliberation rounds.
Key Novelty
Stability Auditing for Multi-LLM Committees
  • Models committee deliberation as a random dynamical system to quantify instability using an empirical Lyapunov exponent derived from trajectory divergence.
  • Identifies two distinct, non-additive routes to chaos: institutional role differentiation (e.g., assigning a Chair) and compositional heterogeneity (mixing model families).
  • Demonstrates that instability is not just thermal noise (persists at T=0) but is structurally induced by protocol design, specifically memory depth and synthesis roles.
Evaluation Highlights
  • Heterogeneous committees (mixed models) without roles show high divergence (Lyapunov exponent = 0.0947) compared to homogeneous baselines.
  • Adding role mandates to homogeneous committees increases instability (Lyapunov exponent increases from 0.0221 to 0.0541).
  • Reducing argument memory depth from k=15 to k=3 consistently lowers divergence across four tested scenarios.
Breakthrough Assessment
8/10
Strong empirical characterization of a critical but overlooked problem (instability at T=0) in multi-agent systems. Provides actionable design principles (role ablation, memory reduction) for governance.
×