RS-VLM: Remote Sensing Vision-Language Model—AI models designed to interpret satellite/aerial imagery using text and visual inputs.
CoT: Chain-of-Thought—a prompting strategy where the model generates intermediate reasoning steps before the final answer.
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm that uses group-level relative rewards to estimate baselines, eliminating the need for a critic network.
LCR: Logical Consistency Reward—a proposed reward signal that penalizes the model if its answer changes when the options are permuted, ensuring the decision is anchored in the reasoning trace.
Logical Hallucination: A phenomenon where a model provides a correct final answer but supports it with incorrect or contradictory reasoning.
SFT: Supervised Fine-Tuning—training on labeled examples to initialize the model's behavior before reinforcement learning.
Geometric Primitives: Basic shapes and structural features (e.g., scale, orientation, density) extracted from images to form the basis of spatial reasoning.