| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparative analysis against state-of-the-art reasoning models shows Embodied-Reasoner consistently achieving higher success rates, with the gap widening significantly on complex tasks. | ||||
| AI2-THOR Tasks (Average) | Success Rate | Not reported in the paper | Not reported in the paper | +9% |
| AI2-THOR Tasks (Average) | Success Rate | Not reported in the paper | Not reported in the paper | +24% |
| AI2-THOR Tasks (Average) | Success Rate | Not reported in the paper | Not reported in the paper | +13% |
| Composite Tasks | Success Rate | Not reported in the paper | Not reported in the paper | +39.9% |