_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
CLIP: Contrastive Language-Image Pre-training—a model trained on image-text pairs to learn aligned visual and textual representations
ULIP: Unified Language-Image Pre-training for 3D Understanding—a baseline method that aligns 3D features with frozen CLIP image and text features
Point-BERT: A Transformer-based 3D encoder that processes point clouds as sequences of masked tokens
SparseConv: Sparse Convolution—a convolutional network designed for efficient processing of sparse 3D voxel data
Linear Probing: Evaluating a pre-trained encoder by freezing it and training a simple linear classifier on top
Zero-shot classification: Classifying objects into categories not seen during training by comparing features to category names' text embeddings
MLP: Multi-Layer Perceptron—a basic feedforward neural network consisting of fully connected layers
Objaverse-LVIS: A large-scale dataset of annotated 3D objects used for evaluation