← Back to Paper List

Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being

Han Li, Renwen Zhang, Yi-Chieh Lee, Robert E. Kraut, D. Mohr
National University of Singapore, Carnegie Mellon University, Northwestern University
npj Digit. Medicine (2023)
Agent MM Benchmark

📝 Paper Summary

AI for Mental Health HCI (Human-Computer Interaction) Clinical Evaluation of AI Systems
A systematic review and meta-analysis of 35 studies demonstrates that AI-based conversational agents significantly reduce depression and distress, with generative AI models showing larger effect sizes than retrieval-based systems.
Core Problem
While rule-based chatbots are common in mental health, the clinical effectiveness of advanced AI-based agents (using NLP/ML) is under-explored, particularly regarding recent generative models and their impact on specific psychiatric symptoms versus general well-being.
Why it matters:
  • Rapid advancements in Large Language Models (LLMs) are being deployed in mental health contexts without a consolidated evidence base regarding their safety or efficacy compared to traditional rule-based systems.
  • Previous reviews focused heavily on rule-based agents or specific disorders, leaving a gap in understanding how technical design choices (e.g., generative vs. retrieval, multimodal vs. text) influence clinical outcomes.
Concrete Example: A retrieval-based agent using predefined scripts might fail to understand a user's complex emotional context, leading to repetitive or generic responses that degrade the therapeutic alliance, whereas a generative agent might offer more personalized support but carries risks of hallucination.
Key Novelty
Meta-analysis of AI-driven (non-rule-based) mental health agents
  • Isolates the effectiveness of AI-based agents (using NLP/ML) specifically, distinguishing them from static rule-based chatbots common in prior literature.
  • Provides the first meta-analytic comparison of clinical effect sizes between generative AI agents (e.g., GPT-based) and retrieval-based NLP agents.
Evaluation Highlights
  • AI-based CAs significantly reduced psychological distress with a large effect size (Hedges' g = 0.7) and depression symptoms (g = 0.64) compared to control conditions.
  • Generative AI-based agents demonstrated a substantially larger effect size on distress (g = 1.244) compared to retrieval-based agents (g = 0.523).
  • Multimodal/voice-based agents showed stronger effects (g = 0.828) than text-only agents (g = 0.665).
Breakthrough Assessment
7/10
Provides strong, aggregated evidence for the efficacy of modern AI in mental health, highlighting a significant performance gap between generative and retrieval approaches, though limited by the high heterogeneity of included studies.
×