PARL: Parallel-Agent Reinforcement Learning—a training framework where an orchestrator agent learns to manage frozen sub-agents to solve tasks concurrently
SFT: Supervised Fine-Tuning—training a model on labeled examples to teach it how to follow instructions
Zero-Vision SFT: A post-training technique using only text-based programmatic data (like Python code) to activate visual reasoning capabilities without actual image data
NaViT: Native Resolution ViT—a vision transformer strategy that packs images of varying resolutions into sequences without resizing or padding
MoE: Mixture of Experts—a neural network architecture where different parts of the model (experts) specialize in different tasks, activated sparsely per token
Credit assignment: The problem in RL of determining which past action is responsible for a final positive or negative outcome
Critical steps: A metric measuring the time cost of a parallel system, defined by the longest sequential path in the execution graph (similar to critical path method)
MoonViT-3D: The specific vision encoder used in K2.5, capable of processing images and compressed video frames
Generative Reward Model: A model trained to evaluate the quality of model-generated outputs and provide a reward signal for RL