School of Computing, National University of Singapore
arXiv
(2026)
RLReasoning
📝 Paper Summary
Dexterous ManipulationExploration in Reinforcement LearningIntrinsic Rewards
CCGE enables general-purpose dexterous manipulation learning by rewarding agents for discovering novel contact patterns between specific fingers and object regions, conditioned on learned object state clusters.
Core Problem
Dexterous manipulation lacks a standard, general-purpose reward signal; existing methods rely on brittle task-specific priors or generic novelty metrics (state/dynamics) that fail to incentivize meaningful physical contact.
Why it matters:
Task-specific rewards (shaping) do not generalize, requiring manual engineering for every new task (e.g., singulation vs. reorientation)
Generic exploration methods (e.g., maximizing prediction error) often lead to task-irrelevant behaviors like waving hands in free space without touching the object
Force-based curiosity is unstable due to the non-smooth, discontinuous nature of contact forces in manipulation
Concrete Example:In a reorientation task, a standard novelty-seeking agent might wave its fingers near the object to maximize state variance without touching it. CCGE specifically rewards the agent only when a finger touches a previously untouched region of the object surface.
Key Novelty
Contact Coverage-Guided Exploration (CCGE)
Explicitly models contact alignment as pairings between hand fingers and discretized object surface regions
Maintains a 'coverage counter' that tracks how often each finger has touched each object region, rewarding rare pairings
Contextualizes exploration using a learned hash of the object state, ensuring that contact strategies learned in one configuration (e.g., grasping) don't suppress exploration in another (e.g., reorienting)
Architecture
Overview of the CCGE pipeline showing the flow from state input to reward generation
Evaluation Highlights
Substantially improves training efficiency and success rates over existing exploration methods (qualitative finding from abstract)
Learned contact patterns transfer robustly to real-world robotic systems (qualitative finding from abstract)
Demonstrated across diverse tasks: cluttered object singulation, constrained object retrieval, in-hand reorientation, and bimanual manipulation
Breakthrough Assessment
8/10
Addresses a critical bottleneck in robotic manipulation—the lack of general-purpose rewards. By formalizing 'contact coverage', it offers a principled alternative to heuristic reward shaping.
⚙️ Technical Details
Problem Definition
Setting: Markov Decision Process (MDP) for robotic manipulation with continuous control
Inputs: Robot proprioception and object state information (point cloud)
Outputs: Continuous low-level control commands for the dexterous hand
Pipeline Flow
Object/Hand Abstraction (Points & Keypoints)
State Clustering (Autoencoder + SimHash)
Contact Detection (Force & Geometry)
Reward Calculation (Coverage Counter Update)
System Modules
Object State Encoder
Compresses object state (current + goal) into a discrete hash to contextualize exploration
Model or implementation: Autoencoder with SimHash projection
Contact Detector
Identifies valid physical interactions between fingers and object regions
Model or implementation: Geometric & Force Thresholding
Coverage Counter
Tracks the frequency of specific finger-region interactions for the current state cluster
Model or implementation: Lookup Table C[s, f, k]
Novel Architectural Elements
State-conditioned contact counters: Maintaining separate exploration statistics for different object state clusters (discovered via hashing) to prevent cross-state interference
Modeling
Base Model: PPO (Proximal Policy Optimization) for policy training
Training Method: Reinforcement Learning with Intrinsic Rewards
Objective Functions:
Purpose: Encoders state space into discrete clusters.
Formally: Autoencoder reconstruction loss + Binary regularization loss (pushing latents to 0 or 1)
Statistical methodology: Not explicitly reported in the provided text
Main Takeaways
CCGE substantially improves training efficiency and success rates compared to existing exploration methods (State/Dynamics novelty) across all tested tasks
The method successfully mitigates 'cross-state interference' by using state-conditioned counters, allowing agents to reuse contact patterns in different task phases
Qualitative results suggest the policies learned with CCGE transfer robustly to real-world systems, implying the discovered contact strategies are physically realistic