Evaluation Setup
Comparison of personalization efficiency and effectiveness in simulation and real-world studies
Benchmarks:
- Cooking Simulation (2D stove-top meal preparation) [New]
- Cleaning Simulation (Robotic arm dusting surfaces) [New]
- Books Simulation (Fetching and handing over books) [New]
- Assisted Feeding Study (Web-based user preference study & Real robot demo) [New]
Metrics:
- User satisfaction (Likert scale)
- Prediction accuracy (categorical choices)
- Statistical methodology: Wilcoxon-Signed Rank test for user study results
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Assisted Feeding (Web Study) |
p-value (Wilcoxon-Signed Rank) |
Not reported in the paper |
<0.005 |
Significant preference
|
Main Takeaways
- CBTL consistently outperforms baselines (No Personalization, Free Explore, Exploit Only, Epsilon-Greedy) across three diverse simulation domains, avoiding overfitting while exploring efficiently.
- Active learning (entropy maximization) is critical; naive exploration (Free Explore) wastes time, while purely greedy methods (Exploit Only) overfit to initial successes (e.g., reusing the first successful book handover pose forever).
- The method generalizes real-world constraints: occlusion preferences learned during a 'feeding' task were successfully transferred to a 'drinking' task without additional training, demonstrating compositional generalization.