← Back to Paper List

Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse

Zhekai Duan, Yuan Zhang, Shikai Geng, Gaowen Liu, J. Boedecker, Chris Xiaoxuan Lu
University of Edinburgh, University of Freiburg
arXiv.org (2025)
MM Reasoning Agent

📝 Paper Summary

Embodied AI Robotic Manipulation Vision-Language-Action (VLA) Models
Fast ECoT accelerates robotic reasoning by caching recurring high-level thoughts and parallelizing reasoning steps, decoupling them from action generation to reduce latency without model retraining.
Core Problem
Embodied Chain-of-Thought (ECoT) requires generating long, sequential reasoning traces autoregressively at every control step, introducing massive latency that makes real-time robotic control impractical.
Why it matters:
  • High latency causes robots to idle while 'thinking', slowing down the control loop significantly
  • Complex tasks require longer reasoning chains, compounding delays and creating a trade-off between interpretability and responsiveness
  • Real-world deployment requires reaction times faster than the seconds-long delays typical of current VLA reasoning models
Concrete Example: In a standard ECoT setup, a robot might wait ~5 seconds to generate a full plan, sub-goals, and visual features before emitting a simple 'move gripper' action. By the time the action is ready, the environment might have changed, or the motion becomes jerky and slow.
Key Novelty
Fast Embodied Chain-of-Thought (Fast ECoT)
  • Exploits 'temporal locality' in robotic reasoning: high-level plans change slowly, so previous reasoning steps can be cached and reused rather than regenerated from scratch
  • Converts sequential reasoning dependency into a parallel batch process where multiple reasoning modules (Plan, Sub-task, etc.) are generated simultaneously using cached prefixes
  • Decouples action generation from reasoning via an asynchronous scheduler, allowing the robot to act immediately on the latest observation while reasoning updates in the background
Architecture
Architecture Figure Figure 3
Comparison between sequential ECoT generation and the proposed Fast ECoT parallel generation framework.
Evaluation Highlights
  • Reduces inference latency by up to 7.5× compared to standard ECoT on real-world robot tasks (Standard: ~5.5s vs. Fast ECoT Async: ~0.7s)
  • Achieves highest success rate (80.0%) on LIBERO simulation benchmark, surpassing both the original ECoT (74.8%) and non-reasoning OpenVLA (75.8%)
  • Maintains high reasoning faithfulness (measured by Action Faithfulness metric), ensuring the accelerated reasoning still accurately reflects the decision-making process
Breakthrough Assessment
8/10
Significantly mitigates the primary bottleneck (latency) of VLA reasoning models without retraining. The asynchronous parallelization strategy is a practical system-level innovation that makes 'thinking' robots viable for real-time control.
×