| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Simulation results demonstrating superior performance on standard benchmarks. | ||||
| LIBERO-LONG | Success Rate | Not reported in the paper | Not reported in the paper | Not reported in the paper |
| LIBERO (Average) | Success Rate | Not reported in the paper | Not reported in the paper | 2.4% |
| CALVIN (ABC-D) | Average Task Length | Not reported in the paper | Not reported in the paper | 0.22 |
| CALVIN (ABC-D) | Average Task Length | Not reported in the paper | Not reported in the paper | 0.25 |
| Real-world experiments validating effectiveness in physical environments. | ||||
| Real-world Long-Horizon | Success Rate | Not reported in the paper | Not reported in the paper | 18.3% |
| Real-world Continual Learning | Success Rate | Not reported in the paper | Not reported in the paper | 21% |