← Back to Paper List

Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance

Harang Ju, Sinan Aral
Johns Hopkins University Carey Business School, Massachusetts Institute of Technology Sloan School of Management
arXiv.org (2025)
Agent MM Benchmark

📝 Paper Summary

Human-AI Collaboration Agentic Workflows
A large-scale randomized experiment reveals that human-AI teams produce more ads with higher text quality but lower image quality than human-human teams, driven by shifts toward task-oriented communication and increased delegation.
Core Problem
Prior research on AI productivity typically studies chatbots as passive tools rather than active collaborators, lacking insight into how multimodal, autonomous agents reshape teamwork processes.
Why it matters:
  • Most existing studies use limited chatbots (not multimodal/agentic) or focus only on individual productivity, missing team-level dynamics.
  • There is a lack of rigorous randomized controlled trials (RCTs) measuring how active AI agents change 'in vivo' work processes like communication and delegation.
  • Understanding these dynamics is critical as AI moves from a tool to a teammate in professional workflows.
Concrete Example: In ad creation, a human team might spend time building rapport ('How are you?') and debating edits. An AI-augmented team might skip pleasantries, with the human delegating drafting to the AI and editing less, potentially speeding up text production but failing to catch visual nuances the AI misses.
Key Novelty
Pairit Platform Field Experiment
  • Develops 'Pairit', a collaborative workspace where AI agents can take the same actions as humans (edit text, generate images, chat), enabling direct comparison of human-human vs. human-AI teams.
  • Conducts a large-scale RCT (2,234 participants) combining lab-based ad creation with a real-world field test (5M impressions on X) to measure actual market performance.
  • Identifies specific teamwork mechanisms—task-oriented communication and delegation—that mediate productivity gains and quality shifts.
Evaluation Highlights
  • Human-AI teams produced 50% more ads per worker compared to human-human teams.
  • Human-AI teams delegated 17% more work to their partners and performed 62% fewer direct text edits.
  • Field experiment on X showed human-AI ads had higher click-through rates (driven by better text) while human-human ads had better cost-per-click (driven by better images).
Breakthrough Assessment
8/10
While not a new model architecture, this is a significant empirical breakthrough. It provides rare, large-scale experimental evidence on *how* agentic AI alters work processes, moving beyond simple 'productivity boost' claims to explain the mechanisms of delegation and communication.
×