_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
HOI: Human-Object Interaction—tasks where a robot or human actively manipulates an object (e.g., carrying a box)
PPO: Proximal Policy Optimization—a popular reinforcement learning algorithm used here for the inner-loop control policy
SAC: Soft Actor-Critic—an entropy-regularized reinforcement learning algorithm used here for the outer-loop meta-policy to learn reward weights
IK: Inverse Kinematics—a mathematical process to calculate the joint angles needed to position a robot's end-effector (hand) at a specific target point
SMPL: Skinned Multi-Person Linear model—a standard parametric model for representing human body shape and pose
IsaacGym: A high-performance physics simulator from NVIDIA used for parallel reinforcement learning training
Unitree G1: A specific commercial humanoid robot hardware platform used for real-world validation
FoundationPose: A computer vision method for estimating the 6D pose (position and orientation) of objects from images/depth data
PD controller: Proportional-Derivative controller—a feedback loop mechanism widely used in control systems to minimize error
Domain Randomization: A technique to improve sim-to-real transfer by varying simulation parameters (friction, mass) during training
Interaction Graph: A feature representation that encodes the distances between key points on the robot and the object to guide contact learning
MDP: Markov Decision Process—a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker