Evaluation Setup
12 policy scenarios across immigration, health, income, climate, speech, and AI governance.
Benchmarks:
- HL-01 (Health Policy) (Multi-round committee deliberation) [New]
- IM-01 (Immigration) (Multi-round committee deliberation) [New]
- CL-01 (Climate) (Multi-round committee deliberation) [New]
- SP-03 (Speech) (Multi-round committee deliberation) [New]
- AI-01 (AI Governance) (Multi-round committee deliberation) [New]
Metrics:
- Empirical Lyapunov exponent (lambda_hat)
- Statistical methodology: Bootstrap confidence intervals computed by replicate resampling.
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Results demonstrating the impact of committee composition (Roles vs. NoRoles and Homogeneous vs. Mixed) on stability in the HL-01 benchmark. |
| HL-01 |
Lyapunov Exponent (lambda_hat) |
0.0221 |
0.0541 |
+0.0320
|
| HL-01 |
Lyapunov Exponent (lambda_hat) |
0.0221 |
0.0947 |
+0.0726
|
| HL-01 |
Lyapunov Exponent (lambda_hat) |
0.0947 |
0.0519 |
-0.0428
|
| Average across 4 scenarios |
Lyapunov Exponent (lambda_hat) |
See Notes |
See Notes |
Negative
|
Main Takeaways
- Instability is design-induced via two routes: institutional differentiation (Roles) and compositional heterogeneity (Mixed Models).
- These routes interact non-additively; adding roles to a mixed committee actually reduced instability compared to the mixed, no-role condition.
- The Chair role is a dominant amplifier of instability; ablating the Chair role yielded the largest reduction in divergence for the HL-01 scenario.
- Reducing the memory window (context length) functions as an effective intervention to attenuate divergence.