World Model: An internal representation of the environment that allows an agent to simulate and predict the consequences of actions without actually performing them
JEPA: Joint Embedding Predictive Architecture—a model architecture that learns to predict representations of future states rather than raw pixels/tokens, improving efficiency
VLM: Vision-Language Model—AI models trained on both images and text to understand and generate content across both modalities
Dyadic interaction: Interaction between two individuals (e.g., human-human or human-agent), involving complex turn-taking and non-verbal cues
Egocentric perception: Perceiving the world from the first-person perspective (like through smart glasses), as opposed to a third-person static camera
RLHF: Reinforcement Learning from Human Feedback—training method to align model outputs with human preferences
Hallucination: When an AI model generates plausible-sounding but factually incorrect or physically impossible information
Zero-shot: The ability of a model to perform a task it was not explicitly trained to do, usually via instruction prompting
NPC: Non-Player Character—an entity in a game or virtual world controlled by the computer rather than a user