← Back to Paper List

Real-world humanoid locomotion with reinforcement learning

Ilija Radosavovic, Tete Xiao, Bike Zhang, T. Darrell, J. Malik, K. Sreenath
University of California, Berkeley
Science Robotics (2023)
RL Memory

📝 Paper Summary

Humanoid Robotics Legged Locomotion
A causal transformer controller, trained via large-scale reinforcement learning on randomized simulations, enables a blind humanoid robot to traverse diverse outdoor terrains and adapt to disturbances zero-shot.
Core Problem
Classical controllers for humanoids struggle to generalize to unstructured environments, while previous learning-based methods (like LSTMs or explicit estimators) often fail to capture the long-term context needed for robust adaptation.
Why it matters:
  • Humanoids have high potential for general-purpose labor but require controllers that function in diverse, unstructured real-world environments
  • Designing explicit estimators for every terrain property (friction, compliance) is brittle and difficult to scale
Concrete Example: When a blind robot's foot gets trapped by a step, classical controllers or simple policies often fail to react, leading to a fall. The proposed model uses history to detect the collision and lifts the leg higher on the next attempt.
Key Novelty
Causal Transformer for Locomotion
  • Hypothesizes that a history of proprioceptive observations and actions implicitly encodes environment properties (like terrain friction or obstacles).
  • Uses a Causal Transformer to process this history, allowing the policy to perform 'in-context learning'—adapting behavior (e.g., gait changes) at test time without updating weights.
Architecture
Architecture Figure Figure 7 (implied)
Inference pipeline using a Causal Transformer.
Evaluation Highlights
  • Achieved zero falls during one week of full-day testing in outdoor environments including plazas, sidewalks, and grass fields.
  • Successfully traversed real-world slopes of up to 8.7% grade and maintained stability under external disturbances like pushes and yoga ball throws.
  • Demonstrated emergent behavioral adaptation, such as altering gait for slopes and recovering from foot-trapping events, which were not explicitly programmed.
Breakthrough Assessment
9/10
Demonstrates highly robust, zero-shot sim-to-real transfer for a full-sized humanoid on difficult terrains using a pure learning-based approach, outperforming commercial model-based controllers in stability.
×