← Back to Paper List

The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners

Vince Trencsenyi, Agnieszka Mensfelt, Kostas Stathis
arXiv (2025)
Agent Reasoning Benchmark

📝 Paper Summary

Agentic AI Multi-agent simulation Game Theory
The paper investigates whether increasing the architectural complexity of LLM-based agents—through decoupled reasoning and role-playing profiles—improves their ability to replicate human strategic behavior in guessing games.
Core Problem
LLMs are increasingly treated as agents, but it is unclear if their strategic reasoning aligns with human behavior (bounded rationality) or if adding agentic sophistication merely makes them theoretical optimizers.
Why it matters:
  • Agent-based models in human-centered domains require validation that agents behave reliably and understandably, not just optimally
  • Current game-theoretic benchmarks often lack standardized frameworks for hosting heterogeneous agent architectures (LLMs vs. traditional models)
  • The 'black-box' nature of LLMs creates reproducibility and explainability gaps in social simulations
Concrete Example: In a 2-player guessing game (p=2/3), the theoretically optimal move is 0. However, humans rarely play 0 immediately. A standard game-theoretic model (EWA) might learn to play near 0 (mean 11.19), failing to simulate the actual human mean (29.05) and thus failing as a descriptive model of human behavior.
Key Novelty
Human-inspired Agentic Sophistication Framework
  • Decomposes agent design into 'Simple' (one-shot) vs 'Reasoner' (decoupled belief formation and decision) architectures to test if explicit reasoning steps improve human alignment
  • Integrates psychological 'Models of Appropriateness' (MoA) into prompts, forcing agents to ask 'What kind of person am I?' and 'What kind of situation is this?' before acting
  • Uses a centralized 'Umpire' framework to standardize interactions between LLM agents and traditional game-theoretic models (EWA) in guessing games
Evaluation Highlights
  • EWA (traditional model) diverges significantly from human behavior with a Wasserstein distance of 22.34, playing far more aggressively (mean 11.19) than humans (mean 29.05)
  • Human experts show significantly higher skewness (1.50) in their guess distribution compared to students (0.55), establishing a distinct behavioral target for agents
  • Qualitative finding: The relationship between agentic design complexity (adding profiles/reasoning steps) and human-likeness is non-linear, suggesting simple architectural augmentation has limits
Breakthrough Assessment
6/10
Solid methodology for evaluating agentic reasoning against human baselines using game theory. The framework is rigorous, though the specific quantitative results for LLM agents (beyond the EWA baseline) are cut off in the provided text.
×