← Back to Paper List

Few-Shot Transfer of Tool-Use Skills Using Human Demonstrations With Proximity and Tactile Sensing

Marina Y. Aoyama, S. Vijayakumar, Tetsuya Narita
Sony Group Corporation (inferred from GitHub URL)
IEEE Robotics and Automation Letters (2025)
Agent MM Pretraining

📝 Paper Summary

Robotic Manipulation Sim-to-Real Transfer Learning from Demonstration (LfD)
A robot learns to use diverse tools by pre-training on primitive motions in simulation with proximity and tactile sensors, then fine-tuning with a few human demonstrations in the real world.
Core Problem
Robots struggle to manipulate grasped tools because it involves complex, unobservable 'extrinsic' contacts between the tool and environment, which vary greatly across different tools and tasks.
Why it matters:
  • Current methods often fix tools to the robot arm, preventing seamless tool swapping or the use of human-designed tools.
  • Learning from scratch requires impractical amounts of human demonstration time.
  • Simulation offers data but suffers from a large 'sim-to-real' gap, especially for deformable tools and complex friction dynamics.
Concrete Example: When a robot tries to use a sponge (soft tool) vs. a brush (rigid handle, soft bristles) on a curved surface, the contact forces and geometry differ wildly. Without adaptation, a policy trained on one fails on the other due to unseen friction and deformation dynamics.
Key Novelty
Multimodal Few-Shot Tool-Use Transfer Framework
  • Pre-trains a base policy in simulation on 'primitive motions' (simple repetitive actions) to learn general contact dynamics using tactile and proximity data.
  • Uses an encoder-decoder architecture where the encoder learns transferable contact features, and the decoder is fine-tuned with a small number of real-world human demonstrations.
  • Combines proximity sensors (to see local geometry before contact) and tactile sensors (to feel forces during contact) to estimate tool-environment interactions without explicit modeling.
Architecture
Architecture Figure Figure 3
The Seq2Seq model architecture with LSTM-based encoder and decoder.
Evaluation Highlights
  • Successfully transfers surface-following skills to novel tools (sponge, brush) using only a small number of real-world demonstrations.
  • Outperforms baseline approaches (direct LfD without pre-training) in manipulating unattached, deformable tools.
  • Demonstrates that combining proximity and tactile sensing improves the identification of contact states and environmental geometry compared to using single modalities.
Breakthrough Assessment
7/10
Significant progress in manipulating *grasped* (not fixed) tools using multimodal sensing and few-shot learning. The sim-to-real strategy for extrinsic contact is a strong contribution, though evaluated on specific surface-following tasks.
×