SPS: Steps Per Second—a measure of environment throughput
PPO: Proximal Policy Optimization—a standard reinforcement learning algorithm used here to verify training dynamics
JAX: A Python library for high-performance numerical computing that compiles to XLA (GPU/TPU)
XLA: Accelerated Linear Algebra—a domain-specific compiler for linear algebra that optimizes JAX code
Sim-to-sim gap: Discrepancy in agent performance when transferring a policy trained in one simulator to another purportedly identical simulator
TOST: Two One-Sided Tests—a statistical procedure used to determine if two sets of data are equivalent within a specific margin, rather than just 'not different'
vmap: Vectorizing map—a JAX transform that automatically vectorizes a function to run over a batch of inputs
jax.lax.scan: A JAX primitive that efficiently loops over a sequence (like time steps) while carrying state, often enabling fusion of entire RL episodes into a single GPU kernel
L1/L2/L3/L4: The four levels of verification: Property tests, Interaction tests, Rollout comparison, and Cross-backend policy transfer