D3QN: Dueling Double Deep Q-Networkβan RL algorithm combining double Q-learning (to reduce overestimation) and dueling architecture (separating value and advantage streams).
Behavioral Cloning (BC): A method where an agent learns a policy by supervising its actions to match those of an expert demonstrator.
Curriculum Learning: A training strategy where the agent is presented with tasks of increasing difficulty, often guided by a teacher.
Asymmetric Self Play (ASP): A training setup where a teacher proposes goals and a student attempts to achieve them; the teacher is rewarded if the student fails (adversarial) or if the goal is appropriate.
Hindsight Experience Replay: A technique in goal-conditioned RL where the agent learns from failures by pretending the state it actually reached was the intended goal.
Zone of Proximal Development: The set of tasks that a learner cannot do alone but can achieve with guidance or demonstration.
Sparse Reward: An environment where the agent receives non-zero rewards very infrequently, making learning difficult.
Grounded Language Learning: Learning the meaning of language by mapping it to physical actions, objects, or sensory data in an environment.