Evaluation Setup
6 Simulated manipulation tasks (3 in 2D Box2D, 3 in 3D PyBullet). Real-world validation on Franka Panda robot.
Benchmarks:
- Push (2D) (Push puck to goal) [New]
- Fetch Cube (3D) (Retrieve object from under overhang) [New]
- Lift Cup (3D) (Lift cup with random geometry) [New]
Metrics:
- Episode Return (Reward)
- Success Rate (Real world)
- Statistical methodology: Results averaged over 6 random seeds (3 for Scoop 3D). Standard error reported via shaded regions in plots.
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Performance comparison against baselines across simulated environments. Values estimated from learning curves (Figure 4) at convergence. |
| Push (2D) |
Episode Return |
450 |
800 |
+350
|
| Fetch Cube (3D) |
Episode Return |
100 |
380 |
+280
|
| Fetch Cube (3D) |
Return |
Not applicable |
High performance maintained |
Not applicable
|
| Fetch Cube (Real Robot) |
Success Rate |
10/12 |
10/12 |
0
|
Main Takeaways
- The framework consistently outperforms stochastic optimization (CMA-ES) and joint optimization baselines (HWasP) in sample efficiency and final performance.
- Learned policies exhibit strong zero-shot generalization to unseen goal locations, capable of designing appropriate tools for novel situations.
- The tradeoff parameter α effectively controls the ratio of material usage vs. control energy: higher α leads to smaller tools requiring more energetic control, and vice versa.
- Real-world experiments confirm that 3D-printed tools designed by the policy are effective (100% success on specific instances), though no single tool solves all task variations.