Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation

📝 Paper Summary

Dexterous Manipulation Exploration in Reinforcement Learning Intrinsic Rewards

CCGE enables general-purpose dexterous manipulation learning by rewarding agents for discovering novel contact patterns between specific fingers and object regions, conditioned on learned object state clusters.

Core Problem

Dexterous manipulation lacks a standard, general-purpose reward signal; existing methods rely on brittle task-specific priors or generic novelty metrics (state/dynamics) that fail to incentivize meaningful physical contact.

Why it matters:

Task-specific rewards (shaping) do not generalize, requiring manual engineering for every new task (e.g., singulation vs. reorientation)
Generic exploration methods (e.g., maximizing prediction error) often lead to task-irrelevant behaviors like waving hands in free space without touching the object
Force-based curiosity is unstable due to the non-smooth, discontinuous nature of contact forces in manipulation

Concrete Example: In a reorientation task, a standard novelty-seeking agent might wave its fingers near the object to maximize state variance without touching it. CCGE specifically rewards the agent only when a finger touches a previously untouched region of the object surface.

Key Novelty

Contact Coverage-Guided Exploration (CCGE)

Explicitly models contact alignment as pairings between hand fingers and discretized object surface regions
Maintains a 'coverage counter' that tracks how often each finger has touched each object region, rewarding rare pairings
Contextualizes exploration using a learned hash of the object state, ensuring that contact strategies learned in one configuration (e.g., grasping) don't suppress exploration in another (e.g., reorienting)

Architecture

Overview of the CCGE pipeline showing the flow from state input to reward generation

Evaluation Highlights

Substantially improves training efficiency and success rates over existing exploration methods (qualitative finding from abstract)
Learned contact patterns transfer robustly to real-world robotic systems (qualitative finding from abstract)
Demonstrated across diverse tasks: cluttered object singulation, constrained object retrieval, in-hand reorientation, and bimanual manipulation

Breakthrough Assessment

8/10

Addresses a critical bottleneck in robotic manipulation—the lack of general-purpose rewards. By formalizing 'contact coverage', it offers a principled alternative to heuristic reward shaping.

⚙️ Technical Details

Problem Definition

Setting: Markov Decision Process (MDP) for robotic manipulation with continuous control

Inputs: Robot proprioception and object state information (point cloud)

Outputs: Continuous low-level control commands for the dexterous hand

Pipeline Flow

Object/Hand Abstraction (Points & Keypoints)
State Clustering (Autoencoder + SimHash)
Contact Detection (Force & Geometry)
Reward Calculation (Coverage Counter Update)

System Modules

Object State Encoder

Compresses object state (current + goal) into a discrete hash to contextualize exploration

Model or implementation: Autoencoder with SimHash projection

Contact Detector

Identifies valid physical interactions between fingers and object regions

Model or implementation: Geometric & Force Thresholding

Coverage Counter

Tracks the frequency of specific finger-region interactions for the current state cluster

Model or implementation: Lookup Table C[s, f, k]

Novel Architectural Elements

State-conditioned contact counters: Maintaining separate exploration statistics for different object state clusters (discovered via hashing) to prevent cross-state interference

Modeling

Base Model: PPO (Proximal Policy Optimization) for policy training

Training Method: Reinforcement Learning with Intrinsic Rewards

Objective Functions:

Purpose: Encoders state space into discrete clusters.

Formally: Autoencoder reconstruction loss + Binary regularization loss (pushing latents to 0 or 1)
Purpose: Incentivize novel contacts (Post-contact).

Formally: r_contact = g(C_{s,f,k}) where g(c) = 1/sqrt(c+1)
Purpose: Guide hand to novel regions (Pre-contact).

Formally: Energy-based reaching reward minimizing weighted distance to under-explored regions
Purpose: Mitigate detachment/short-sightedness.

Formally: Scaled rewards clipped to be positive only if they exceed the episodic cumulative maximum (r = [r - r_max]_+)

Compute: Not reported in the provided text

Comparison to Prior Work

vs. State Novelty: CCGE focuses strictly on physical contact events rather than global state variance
vs. HaC: CCGE uses discrete contact counts rather than continuous force prediction, avoiding instability from force spikes
vs. Task Shaping: CCGE is general-purpose and discovers strategies autonomously without engineered approach/lift stages

Limitations

Relies on predefined object surface points and hand keypoints
Discrete hashing might collapse distinct states if autoencoder is not well-regularized
Quantitative results (exact success rates) not available in the provided text excerpt

Reproducibility

Code: https://contact-coverage-guided-exploration.github.io

Project page provided (https://contact-coverage-guided-exploration.github.io). Code promised to be publicly available. Algorithm logic (hashing, contact detection, reward formulas) fully described in text.

📊 Experiments & Results

Evaluation Setup

Simulation-based training with PPO across multiple dexterous manipulation tasks

Benchmarks:

Cluttered Object Singulation (Dexterous Manipulation)
Constrained Object Retrieval (Dexterous Manipulation)
In-hand Reorientation (Dexterous Manipulation)
Bimanual Manipulation (Dexterous Manipulation)

Metrics:

Success Rate
Training Efficiency (Convergence Speed)
Statistical methodology: Not explicitly reported in the provided text

Main Takeaways

CCGE substantially improves training efficiency and success rates compared to existing exploration methods (State/Dynamics novelty) across all tested tasks
The method successfully mitigates 'cross-state interference' by using state-conditioned counters, allowing agents to reuse contact patterns in different task phases
Qualitative results suggest the policies learned with CCGE transfer robustly to real-world systems, implying the discovered contact strategies are physically realistic

📚 Prerequisite Knowledge

Prerequisites

Reinforcement Learning (MDPs, PPO)
Intrinsic Motivation / Exploration Rewards
Locality-Sensitive Hashing (SimHash)

Key Terms

CCGE: Contact Coverage-Guided Exploration—the proposed method that rewards novel finger-object contact patterns

SimHash: A dimensionality reduction technique that maps high-dimensional vectors to compact binary fingerprints (hashes) while preserving similarity

PPO: Proximal Policy Optimization—a policy gradient reinforcement learning algorithm used as the backbone optimizer

Intrinsic Reward: A reward signal generated internally by the agent (e.g., for curiosity) rather than given by the environment

Post-contact reward: A sparse reward given only when physical contact is detected, incentivizing novel interactions

Pre-contact reward: A dense energy-based reward guiding the hand toward regions likely to yield novel contacts

Autoencoder: A neural network trained to compress data into a lower-dimensional latent representation and reconstruct it