Evaluation Setup
Cooperative coverage task in a continuous 2D world
Benchmarks:
- simple_spread_v3 (PettingZoo) (Cooperative landmark coverage)
Metrics:
- Mean Episode Reward
- Coordination Score (Distinct landmarks covered / Total landmarks)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Training curves demonstrate the learning progression of the agents. |
| simple_spread_v3 |
Episode Reward |
-145 |
-110 |
+35
|
| Coordination analysis reveals persistent minor inefficiencies. |
| simple_spread_v3 |
Incomplete Coverage Rate |
0 |
9 |
9
|
Main Takeaways
- Agents successfully learn to coordinate and cover distinct landmarks without communication, driven solely by a team-based reward signal
- Emergent behavior includes spatial separation and role specialization, visualized through distinct non-overlapping trajectories
- Performance plateaus around 500 episodes, suggesting rapid initial learning followed by fine-tuning
- The lightweight IPPO approach is sufficient for solving basic cooperative MARL tasks without heavy algorithmic overhead