ID Routing: A mechanism that predicts which identity (from a set of references) should influence a specific spatial location in the generated image's feature map.
Intrinsic ID features: Features from the penultimate layer of a face recognition backbone, representing identity invariant to pose/expression.
Face structure features: Shallow features from the face backbone combined with CLIP local features, capturing spatial details like shape and texture.
Q-Former: A transformer module that queries and aggregates visual features into a fixed set of tokens for the diffusion model.
Identity blending: A failure mode in multi-ID generation where the features of different reference identities get mixed up on a single generated face.
DropToken & DropPath: Regularization techniques that randomly drop tokens or network paths during training to force the model to rely on specific robust features.
Gumbel Softmax: A reparameterization trick allowing differentiable sampling from a categorical distribution, used here for the discrete routing decision.