ControlNet: A neural network structure to add spatial conditioning controls to large pre-trained text-to-image diffusion models
LoRA: Low-Rank Adaptation—a technique to fine-tune large models by injecting trainable low-rank decomposition matrices
IP-Adapter: Image Prompt Adapter—a method to enable image prompting by decoupling cross-attention layers for text and image features
CLIP: Contrastive Language-Image Pre-training—a model trained to map images and text to a shared embedding space, often used for semantic guidance
ID embedding: A vector representation of facial identity extracted from a face recognition model (e.g., antelopev2), capturing strong semantic identity details
UNet: The core neural network architecture used in Stable Diffusion for denoising images in the latent space
IdentityNet: InstantID's specialized ControlNet module that conditions generation on facial landmarks and ID embeddings instead of text or open-pose keypoints