DP-MDP: Dynamic-Parameter Markov Decision Process—an MDP where transition/reward functions depend on a hidden parameter that evolves over time
HiP: Hidden Parameter—a latent variable (z) that characterizes the current environment dynamics (e.g., friction, mass)
CPC: Contrastive Predictive Coding—unsupervised learning method that learns representations by predicting future observations in a latent space
InfoNCE: Information Noise Contrastive Estimation—a loss function used in contrastive learning to maximize mutual information between inputs
TD3+BC: Twin Delayed Deep Deterministic Policy Gradient with Behavior Cloning—an offline RL algorithm that constrains the learned policy to stay close to the data-generating policy
HiP-MDP: Hidden-Parameter MDP—an MDP where the hidden parameter is sampled once and stays constant (unlike DP-MDP where it evolves)
BOReL: Bayes Adaptive Offline RL—a baseline method that infers hidden parameters using a Variational Autoencoder
ContraBAR: Contrastive Bayes-Adaptive Deep RL—a baseline method using CPC for belief state inference