mGPT: Multimodal Generative PreTraining—a decoder-only transformer pretrained on large-scale multimodal sequences (text + discrete image tokens).
FP-SFT: Flexible Progressive Supervised Finetuning—a training strategy starting with low-resolution images and progressively increasing resolution, using variable aspect ratios.
Uni-Rep: Unambiguous Image Representation—an enhanced token sequence format that adds explicit height, width, and row-break tokens to resolve ambiguity in 1D flattened image sequences.
Omni-SFT: Omnipotent Supervised Finetuning—a final tuning stage incorporating diverse tasks (generation, understanding, editing, dense prediction) to create a generalist model.
z-loss: An auxiliary loss function (log(Z)^2) used to stabilize training by controlling the magnitude of the partition function (logits).
Chameleon: A family of multimodal models by Meta used here as the initialization checkpoint (7B and 30B variants).