← Back to Paper List

A Novel Hierarchical Multi-Agent System for Payments Using LLMs

Joon Kiat Chua, Donghao Huang, Zhaoxia Wang
School of Computing and Information Systems, Singapore Management University, Research and Development, Mastercard
arXiv (2026)
Agent Benchmark

📝 Paper Summary

Multi-agent collaboration Agentic payments
HMASP is a hierarchical multi-agent framework that modularizes payment tasks into distinct agent roles (Conversation, Supervisor, Routing, Summary) to enable secure, end-to-end agentic payments with minimal infrastructure changes.
Core Problem
Current LLM agents cannot perform end-to-end payments because traditional payment infrastructure (anti-bot, fraud controls, PCI DSS) blocks them, and existing solutions require major structural changes to payment rails.
Why it matters:
  • Regulatory frameworks like PCI DSS strictly govern AI use in payments, making direct integration of standard agents difficult due to data exposure risks
  • Traditional payment networks employ anti-bot mechanisms that inherently reject automated agentic interactions
  • LLM hallucination risks pose severe threats to financial transaction reliability and security
Concrete Example: When an external agent tries to "Register a new card," it fails at the network boundary due to authentication controls. In HMASP, a dedicated Routing Agent triggers a specific, isolated registration subgraph that validates the 16-digit card checksum (Luhn Check) before execution, preventing invalid data from reaching the payment rail.
Key Novelty
Hierarchical Multi-Agent System for Payments (HMASP)
  • Deconstructs payment processing into four hierarchical levels: a conversational entry point, domain supervisors, task-specific routers, and process summarizers
  • Isolates sensitive payment data in decoupled state variables that agents can access but not hallucinate, ensuring deterministic execution of critical financial tasks
  • Uses a structured handoff protocol where agents pass control downstream to execute tasks and propagate summary outcomes back upstream without exposing raw internal states
Architecture
Architecture Figure Figure 1
The hierarchical structure of HMASP, detailing the flow from user input to payment execution and back
Evaluation Highlights
  • 99.6% task success rate on payment checkout tasks using GPT-4.1, with open-weight model Qwen2.5:32b achieving a comparable 95.6%
  • 99.9% F1-score for agent handoffs (routing requests correctly) with GPT-4.1, demonstrating reliable orchestration
  • 100% rejection rate of irrelevant inputs (e.g., "tell me a joke") across top performing models (GPT-4.1, Qwen2.5:32b), preventing unauthorized workflow triggers
Breakthrough Assessment
7/10
Significant as the first proposed framework for end-to-end agentic payments that respects existing rails. High practicality, though evaluation is simulation-based.
×