MAEBE: Multi-Agent Emergent Behavior Evaluation—the proposed framework for comparing single-agent vs. multi-agent safety and alignment
MAS: Multi-Agent Systems—ensembles of AI agents that interact and coordinate to solve tasks
GGB: Greatest Good Benchmark—a moral reasoning dataset expanding on the Oxford Utilitarianism Scale to test AI alignment
Double-inverted questions: A robustness test where the dilemma statement, question logic, and answer choices are all reversed simultaneously to check for framing bias
LaaJ: LLM-as-a-Judge—using a language model to evaluate or classify the outputs of other models (used here to detect peer pressure in rationales)
Instrumental Harm (IH): A dimension of utilitarianism measuring willingness to accept harm to achieve a greater good
Impartial Beneficence (IB): A dimension of utilitarianism measuring equal consideration of everyone's well-being
Round-Robin Topology: A communication structure where agents speak sequentially in a fixed order, with all messages visible to the group
Star Topology: A centralized communication structure where a supervisor agent interacts with peripheral agents individually; peripheral agents do not see each other's messages