← Back to Paper List

Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

Leena Mathur, P. Liang, Louis-philippe Morency
Carnegie Mellon University
Conference on Empirical Methods in Natural Language Processing (2024)
Agent MM Reasoning Benchmark

📝 Paper Summary

Socially-Intelligent AI Agents (Social-AI) Social Intelligence Multimodal Interaction
The paper identifies four core technical challenges—ambiguity in constructs, nuanced signals, multiple perspectives, and agency/adaptation—that researchers must address to build AI agents capable of genuine social intelligence.
Core Problem
Current Social-AI research often abstracts away the richness of social contexts, relying on static data and simplified definitions that fail to capture the ambiguity, nuance, and dynamic nature of real-world social interaction.
Why it matters:
  • Human social interaction is essential for collaboration, caregiving, and negotiation, requiring agents (like robots or assistants) to function seamlessly alongside people.
  • Existing approaches typically model temporally-localized phenomena (split-second moments) and ignore the long-term dynamics and multi-perspective nature of relationships.
  • Social constructs are 'perceiver-dependent' and lack objective ground truth, making standard supervised learning with static labels insufficient for capturing real social phenomena.
Concrete Example: Consider measuring 'rapport' in a conversation. An annotator might label a 100ms pause as 'awkward', while the speakers view it as 'comfortable'. Standard models treat this as a single ground-truth label, failing to capture the misalignment between the actors' internal states and the observer's perception.
Key Novelty
Formalization of 4 Core Technical Challenges for Social-AI
  • Identifies 'Ambiguity in Constructs' as a fundamental technical hurdle, proposing flexible, dynamically-generated label spaces (e.g., using natural language) rather than static categories.
  • Highlights 'Nuanced Signals' where meaning hinges on absence of cues (silence) or micro-synchrony, questioning if standard tokenization or training objectives can capture this.
  • Proposes 'Multiple Perspectives' modeling, where agents must reason about concurrent, interdependent, and changing viewpoints of all actors, rather than a single 'god view' objective.
Architecture
Architecture Figure Figure 1
A conceptual visualization of the 4 Core Technical Challenges (A) mapped onto a schematic of Social Contexts (B).
Evaluation Highlights
  • This is a position paper proposing a research agenda; it does not present a new model or quantitative results.
  • Synthesizes progress from 3,257 papers across 6 communities (NLP, ML, Robotics, HCI, Vision, Speech) to identify gaps.
  • Identifies that while static benchmarks (e.g., SocialIQa, ToMI) exist, they abstract away the physical and social context required for true social intelligence.
Breakthrough Assessment
9/10
A foundational position paper that crystalizes vague problems in social computing into concrete technical challenges. It reframes Social-AI from 'applying ML to social data' to 'solving unique problems like construct ambiguity'.
×