← Back to Paper List

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

Sheng-You Huang, Hsiao-Chuan Chang, Yen-Chi Chen, Ting-Han Wei, I-Hau Yeh, Sheng-Yao Kuan, Chien-Yao Wang, Hsuan-Han Lee, I-Chen Wu
arXiv (2026)
RL Agent

📝 Paper Summary

Traffic Signal Control (TSC) Multi-Agent Reinforcement Learning (MARL)
A multi-agent reinforcement learning framework for traffic control that combines randomized training, exponential phase adjustments, and neighbor-based observations to improve robustness and scalability.
Core Problem
Existing RL traffic control methods overfit to static training patterns, lack safety-critical stability in action spaces, and struggle to scale coordination to large networks.
Why it matters:
  • Traffic congestion costs the U.S. economy over $85 billion annually (2025 data), with drivers losing 50–112 hours to delays
  • Standard RL agents memorize fixed timing schedules instead of learning dynamics, failing when real-world traffic flows fluctuate
  • Centralized approaches do not scale to large city grids due to exponential state-space growth, while local approaches fail to coordinate green waves
Concrete Example: In standard training, if traffic always arrives at a fixed rate, an agent implicitly memorizes 'switch after 15s' rather than reacting to queue lengths. When deployed where flow varies, this brittle policy fails to clear sudden platoons, causing gridlock.
Key Novelty
Robust CTDE with Exponential Control
  • Turning Ratio Randomization: Perturbs traffic turning probabilities during training to prevent agents from overfitting to static flow patterns
  • Exponential Phase Duration Adjustment: A cyclic action space using exponential steps (e.g., ±1s, ±2s, ±4s) to allow both fine-tuning for stability and large jumps for responsiveness
  • Neighbor-Based CTDE: Uses Centralized Training with Decentralized Execution where agents only observe immediate neighbors, balancing global coordination with scalable local communication
Evaluation Highlights
  • Reduces average waiting time by over 10% compared to standard RL baselines in unseen traffic scenarios
  • Demonstrates superior generalization to dynamic flow variations where baselines suffer from overfitting
  • Maintains high control stability through the proposed exponential adjustment mechanism, avoiding the oscillation issues of binary switching methods
Breakthrough Assessment
7/10
Solid engineering improvements for RL-TSC. The exponential action space and randomization are practical solutions to known overfitting/stability issues, though the fundamental algorithm (MAPPO) is standard.
×