← Back to Paper List

Gymnasium: A Standard Interface for Reinforcement Learning Environments

Mark Towers, Ariel Kwiatkowski, Jordan K. Terry, John U. Balis, G. Cola, T. Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, KG Arjun, Rodrigo Perez-Vicente, Andrea Pierr'e, Sander Schulhoff, Jun Jet Tai, Hannah Tan, Omar G. Younis
University of Southampton, Meta AI, FAIR
arXiv.org (2024)
RL Benchmark

📝 Paper Summary

Reinforcement Learning (RL) Software Engineering for AI Standardization
Gymnasium is the maintained, standardized successor to OpenAI Gym, introducing a functional API for hardware acceleration, expanded vectorization support, and rigorous versioning to ensure reproducible RL research.
Core Problem
The previous standard for RL environments, OpenAI Gym, ceased maintenance in 2021, leading to stagnation, lack of support for modern hardware acceleration, and reproducibility issues due to inconsistent implementations.
Why it matters:
  • Lack of standardization hinders comparison between RL algorithms and slows progress
  • Modern RL research requires massive scale (millions/billions of steps), which old object-oriented APIs cannot efficiently support via hardware acceleration (e.g., JAX)
  • Unmaintained software creates technical debt and bugs that invalidate research findings
Concrete Example: In OpenAI Gym, infinite horizon tasks were often conflated with time-limited episodes, confusing the agent's value estimation. Gymnasium fixes this by explicitly separating 'termination' (agent reached a terminal state) from 'truncation' (time limit reached), clarifying the signal sent to the algorithm.
Key Novelty
Functional Environment API (FuncEnv) & Strict Standardization
  • Introduces `FuncEnv`, a stateless functional API mirroring POMDP theory (separate `transition`, `reward`, `observation` functions), enabling seamless vectorization and JAX-based hardware acceleration
  • Formalizes the distinction between episode `termination` (natural end) and `truncation` (artificial time limit) to correct theoretical inconsistencies in value estimation
  • Expands `VectorEnv` to support arbitrary vectorization methods, crucial for high-throughput training
Evaluation Highlights
  • Over 18 million downloads since initial release (Nov 2023 - May 2025)
  • Widely adopted ecosystem with over 800 Pull Requests from 40+ contributors
  • Includes a suite of built-in environments (Classic Control, Box2D, MuJoCo, Toy Text) serving as standard baselines
Breakthrough Assessment
9/10
While not a new algorithmic invention, it is the foundational infrastructure for the entire RL field. Its adoption is critical for reproducibility and enabling next-gen hardware-accelerated RL.
×