Environment Shaping: The manual process of modifying rewards, observations, actions, and dynamics to make an RL problem solvable
Reference Environment: A set of sample task instances (e.g., varying object positions) used to guide shaping, distinct from the test set
Shaped Environment: The modified environment f(E_ref) used for training, containing dense rewards and simplified dynamics
Bi-level Optimization: An optimization problem where one problem is embedded within another; here, finding the environment that produces the best trained policy
PPO: Proximal Policy Optimization—a standard RL algorithm used as the solver in the inner loop
IsaacGymEnvs: A suite of GPU-accelerated robotics environments used as a standard benchmark
Oracle Distribution: The true, complex distribution of real-world scenarios the robot will face, which is difficult to model perfectly in simulation
Sim-to-Real: Training a robot in a simulator and transferring the learned policy to a physical robot
Eureka: A recent LLM-based method for automating reward design (cited as a partial solution to environment shaping)