GRPO: Group Relative Policy Optimization—an RL algorithm that normalizes rewards within a group of outputs for the same prompt, removing the need for a value function critic
Hi-GRPO: Hierarchical Group Relative Policy Optimization—the authors' proposed method that splits generation into semantic planning (coarse) and visual refinement (fine) steps
ShapeLLM-Omni: The base autoregressive model used, which unifies 3D generation and understanding by treating discretized 3D tokens like text
HPS: Human Preference Score—a reward model trained to predict human aesthetic preferences for images
MME-3DR: Multi-Modal Evaluation for 3D Reasoning—the authors' new benchmark focusing on implicit reasoning tasks like spatial relations and mechanical affordances
VQVAE: Vector Quantized Variational AutoEncoder—a method to compress high-dimensional data (like 3D shapes) into discrete tokens
CLIP Score: A metric measuring the semantic similarity between the generated 3D object's rendered images and the text prompt
LMM: Large Multi-modal Model—models like Qwen2-VL that can process both images and text, used here as reward functions
Chain-of-Thought (CoT): A prompting technique where the model generates intermediate reasoning steps before the final answer