World Model: A model that learns the environment's dynamics (transition and reward functions), allowing the agent to simulate or understand the environment without interacting with it.
Offline RL: Learning optimal policies from static datasets of previously collected experience without interacting with the environment during training.
Meta-RL: Meta-Reinforcement Learning—learning a learning algorithm or policy that can quickly adapt to new, unseen tasks.
Decision Transformer (DT): An approach that casts RL as a sequence modeling problem, using transformers to predict actions given states and desired returns.
Inductive Bias: Assumptions built into a learning algorithm that help it generalize to new data (e.g., using prediction error to identify informative trajectory segments).
Disentanglement: Separating different factors of variation in the data—here, separating task-specific dynamics (environment) from behavior-specific features (policy).
Causal Transformer: A transformer model that attends only to past and current tokens (masking future tokens) to respect temporal causality.
Zero-shot Generalization: The ability to perform a task without any prior specific examples or fine-tuning on that specific task.
Few-shot Generalization: The ability to adapt to a new task given only a small number of examples (context).