Markovian reward: The immediate, unobserved reward r_t explicitly associated with a state-action pair at time t, as opposed to the delayed accumulated return
return decomposition: Techniques to break down a single long-term return value into a sequence of individual proxy rewards for each time step
compact representation: A subset of state dimensions identified as causally relevant to the reward function, used to reduce the input space for the policy
Gumbel-Softmax: A reparameterization trick allowing differentiable sampling from categorical distributions, used here to learn binary causal masks
RUDDER: A baseline method that uses LSTMs to redistribute rewards to key events in a sequence
MDP: Markov Decision Process—a mathematical framework for modeling decision making where outcomes are partly random and partly under the control of a decision maker
DBN: Dynamic Bayesian Network—a graphical model that relates variables to each other over adjacent time steps