← Back to Paper List

Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task

Shao Zhang, Xihuai Wang, Wenhao Zhang, Yongshan Chen, Lan Gao, Dakuo Wang, Weinan Zhang, Xinbing Wang, Ying Wen
Shanghai Jiao Tong University, Northeastern University
arXiv.org (2024)
Agent Reasoning Benchmark

📝 Paper Summary

Human-AI Teams (HATs) Theory of Mind (ToM) in AI
This study investigates Mutual Theory of Mind in human-AI teams, finding that while agent ToM capabilities improve human feeling of being understood, bidirectional verbal communication in real-time tasks reduces overall team performance.
Core Problem
Prior Human-AI Team (HAT) research ignores the 'Mutual' Theory of Mind (MToM) process where both parties reason about each other, and how this interplay affects real-time collaboration.
Why it matters:
  • Real-time shared workspace tasks (like disaster response or joint manufacturing) require immediate coordination where the cost of communication can outweigh its benefits.
  • Existing HAT studies often treat agent ToM and communication interactivity in isolation, missing how the bidirectional reasoning process shapes team dynamics.
  • Understanding MToM is critical for designing agents that actually feel collaborative rather than just functional.
Concrete Example: In a frantic cooking game like Overcooked, if a human has to stop chopping onions to type 'pass me a plate' to an AI (bidirectional communication), the time lost typing might cause the burger to burn, leading to worse performance than if they just worked silently.
Key Novelty
Empirical MToM Analysis in Real-Time HATs
  • Conducts a mixed-design study (Communication Level × Agent ToM Capability) to isolate the effects of Mutual Theory of Mind in a shared workspace.
  • Deploys a real LLM-driven agent (GPT-4o mini) rather than a Wizard-of-Oz setup, enabling actual autonomous reasoning and communication.
  • Integrates communication directly into the agent's ToM control framework, where the agent considers both human actions and language to shape its decisions.
Architecture
Architecture Figure Figure 1 (implied from text)
Conceptual diagram of the Mutual Theory of Mind (MToM) process.
Breakthrough Assessment
7/10
Provides counter-intuitive evidence that more communication isn't always better in HATs and validates LLM-driven ToM agents in real-time settings. The focus on 'Mutual' ToM is a valuable conceptual shift.
×