V2PE: Variable Visual Position Encoding—a mechanism using fractional position increments for visual tokens to fit more visual context into the window
MPO: Mixed Preference Optimization—a post-training phase combining preference loss (DPO), quality loss (BCO), and generation loss to align model outputs
SFT: Supervised Fine-Tuning—training the model on high-quality instruction-response pairs
DPO: Direct Preference Optimization—a method to align models to human preferences without a separate reward model
BCO: Binary Classifier Optimization—used here as a quality loss to help the model distinguish absolute response quality
Pixel Unshuffle: An operation that rearranges spatial blocks of pixels into the channel dimension, reducing sequence length (used here to reduce 448x448 tiles to 256 tokens)
InternEVO: An optimized training infrastructure extending ZeRO for efficient large-scale MLLM training
VisualPRM: Visual Process Reward Model—a critic model used during inference to score steps in a chain-of-thought solution