← Back to Paper List

Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction

Sahar Salimpour, Leijie Fu, Farhad Keramat, L. Militano, G. T. Carughi, Harry Edelman, J. P. Queralta
arXiv.org (2025)
Agent MM Reasoning

📝 Paper Summary

Robotic Middleware Integration Vision-Language-Action (VLA) Models
The authors propose a taxonomy for 'Agentic AI' in robotics, classifying how foundation models interface with middleware like ROS as translators, orchestrators, or embedded policies rather than just end-to-end learners.
Core Problem
Prior surveys focus primarily on end-to-end multimodal learning or high-level planning, neglecting the emerging software design patterns where AI agents interface with standard robotic middleware and tools.
Why it matters:
  • Practical deployment relies on integrating LLMs with existing, tested software stacks (like ROS) rather than replacing them entirely
  • Many impactful developments are community-driven (GitHub projects, MCP servers) and remain underrepresented in academic literature
  • There is a lack of clear terminology distinguishing 'end-to-end' control from modular 'agentic' middleware approaches
Concrete Example: Early approaches like ROS2AI simply translated text to CLI commands. Newer frameworks like ROSA need to maintain state, validate parameters, and coordinate multiple tools (navigation, manipulation) safely, requiring a structured architecture beyond simple translation.
Key Novelty
Taxonomy of Agentic Integration and Roles
  • Classifies integration into four distinct approaches: Protocol (translator), Interface (interactive loop), Orchestration (resource manager), and Embedded (direct policy)
  • Distinguishes agent roles based on functional design: Planners (generate sequence upfront) vs. Orchestrators (active runtime management of subsystems)
  • Highlights the shift from centralized control to decentralized protocols (e.g., FABRIC in OpenMind) and plugin-based architectures (MCP servers)
Breakthrough Assessment
7/10
A timely systematization of the rapidly growing 'middleware agent' space in robotics. While it doesn't propose a new model, the taxonomy provides necessary structure for comparing disparate frameworks like ROSA, RAI, and RT-2.
×