GNPFA: Generalized Neural Parametric Facial Asset—the authors' VAE-based representation that encodes facial geometry into a latent space, decoupling identity from expression
M2F-D: Media2Face Dataset—the large-scale (60+ hours) dataset created by the authors by extracting GNPFA latents from diverse video sources
FACS: Facial Action Coding System—a standard for categorizing physical expression of emotions, used here to create personalized blendshapes
LVE: Lip Vertex Error—metric measuring the Euclidean distance between generated and ground-truth lip vertices
FDD: Face Dynamics Deviation—metric measuring the difference in motion standard deviation between generated and real sequences
Classifier-Free Guidance: A technique in diffusion models to control the strength of conditional inputs (like style or audio) by interpolating between conditional and unconditional noise predictions
RoM: Range of Motion—a dataset of 4D facial scans capturing extreme facial movements, used to train the GNPFA
UV space: A 2D coordinate system used to map textures or geometry onto a 3D model surface; here used for geometry images