Evaluation Setup
Offline Multi-Task MARL training on fixed datasets, evaluated on seen and unseen scenarios
Benchmarks:
- SMAC (StarCraft Multi-Agent Challenge)
- SMAC-v2 (StarCraft Multi-Agent Challenge (stochastic spawn))
- MPE (Multi-Particle Environments)
- MaMuJoCo (Multi-Agent MuJoCo (continuous control))
Metrics:
- Win rate
- Average return
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- The paper claims consistent improvements over baselines (ODIS, HiSSD) across diverse benchmarks (SMAC, MPE, MaMuJoCo), particularly in generalizing to tasks with different numbers of agents.
- Qualitative analysis of attention maps reveals that prior methods (HiSSD) distribute attention uniformly, while STAIRS-Former successfully focuses attention on critical entities and history tokens.
- The use of token dropout is claimed to be critical for robustness when the number of agents/entities in the test set differs from the training set.
- The hierarchical history module allows the model to leverage long-term dependencies, which are underutilized in standard UPDeT-based architectures.