← Back to Paper List

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control

Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, K. Sreenath
University of California Berkeley, Simon Fraser University, Université de Montréal, Mila – Quebec AI Institute
Int. J. Robotics Res. (2024)
RL Memory Benchmark

📝 Paper Summary

Legged Locomotion Sim-to-Real Transfer
A reinforcement learning framework using a dual-history architecture—combining short-term feedback with long-term input/output logs—enables a bipedal robot to perform agile walking, running, and jumping with zero-shot sim-to-real transfer.
Core Problem
Controlling bipedal robots is difficult due to their high-dimensional, nonlinear, and underactuated dynamics, where distinct skills (like walking vs. jumping) typically require specialized, handcrafted contact plans.
Why it matters:
  • Traditional model-based optimal control is computationally expensive and struggles with real-time whole-body planning for diverse agile skills
  • Prior RL methods often focus on single skills (e.g., just walking) or fail to transfer highly dynamic aperiodic motions (like jumping) to the real world without fine-tuning
Concrete Example: Running introduces a flight phase where the robot is underactuated and unstable; standard walking controllers that rely on orbital stability fail here, and model-based methods often cannot re-plan contact sequences fast enough for real-world disturbances.
Key Novelty
Dual-History Policy Architecture with Multi-Stage Training
  • Incorporates two history streams: a 'short' history (4 steps) for immediate feedback control and a 'long' history (66 steps/2 seconds) processed via a CNN to implicitly identify system dynamics
  • Utilizes a training curriculum that moves from single-task learning to 'task randomization' (varying goals) and finally 'dynamics randomization', fostering robustness and disturbance compliance
Architecture
Architecture Figure Fig. 3
The control policy architecture showing the dual-history processing streams.
Evaluation Highlights
  • Running: Achieved a 400-meter dash in 2 minutes 34 seconds on the Cassie robot (approx 2.6 m/s), outperforming prior RL methods that could not sustain turning or long-distance running
  • Jumping: Demonstrated a standing long jump of 1.4m and a vertical box jump of 0.44m, significantly exceeding prior controller capabilities (e.g., 0.41m max leap)
  • Robustness: Zero-shot transfer to real hardware with the ability to recover from unexpected external forces and adapt to hardware changes over a one-year timespan
Breakthrough Assessment
9/10
Demonstrates unprecedented versatility on a bipedal platform, unifying walking, running, and jumping in one framework with successful zero-shot transfer and impressive physical benchmarks (400m dash).
×