Reasoning MLLMs: Multimodal models incentivized to produce long intermediate reasoning chains (CoTs) before generating final outputs
Intermediate CoT: The step-by-step reasoning text generated by the model before the final answer, used to explain the decision process
Hallucination: Content in generated text that is inconsistent with factual knowledge, multimodal evidence, or logical context
Rubric-based evaluation: An assessment method where an LLM judge scores model outputs against specific, pre-defined criteria (rubrics) for distinct capabilities
IoU: Intersection over Unionβa metric used in grounding tasks to measure the overlap between a predicted bounding box and the ground truth box
Cognitive Dimensions: The three top-level categories in the paper's taxonomy: Knowledge (facts), Perception (visual/audio sensing), and Reasoning (logic)
H-score: Hallucination-free score, calculated as 1 minus the ratio of hallucinated content, quantifying the absence of errors