_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
VLM: Vision-Language Model—AI models capable of processing and understanding both visual (images) and textual data simultaneously.
MCQ: Multiple-Choice Question—An evaluation format where the model must select the correct answer from a set of predefined options.
Non-Optical: Imagery not captured in the visible light spectrum, such as Synthetic Aperture Radar (SAR), used for flood detection or earthquake assessment.
Temporal Analysis: The process of analyzing data across time, often using sequences of images to detect changes like urban development or disaster impact.
IoU: Intersection over Union—A metric used to evaluate object detection and segmentation accuracy by measuring the overlap between the predicted and ground truth regions.
mIoU: Mean Intersection over Union—The average IoU calculated across all classes or instances in a dataset.
BERTScore: A metric for evaluating text generation (like image captions) by computing the semantic similarity between candidate and reference sentences using contextual embeddings.
Grounding: The ability of a model to link textual concepts to specific regions or objects within an image (e.g., bounding boxes).
SAR: Synthetic Aperture Radar—A form of radar that is used to create two-dimensional images or three-dimensional reconstructions of objects, useful in non-optical geospatial tasks.
Hallucination: A phenomenon where an AI model generates incorrect or nonsensical information that is not supported by the input data.
Referring Expression: A task where the model must identify or segment a specific object in an image based on a natural language description.