IK: Insufficient Knowledge—The model fails on sub-problems (basic concepts) and consequently fails the main composite problem
IG: Inadequate Generalization—The model solves all sub-problems correctly but fails to combine them to solve the main composite problem
CM: Complete Mastery—The model correctly solves both the sub-problems and the main composite problem
RM: Rote Memorization—The model fails sub-problems but 'correctly' answers the main problem, implying guessing or data leakage
KCA: Knowledge Concept Augmentation—A strategy of providing explicit textbook definitions/formulas of knowledge concepts to the model to aid reasoning
LMM: Large Multimodal Model—AI models capable of processing and reasoning over both text and visual inputs
COT: Chain of Thought—prompting technique encouraging models to generate intermediate reasoning steps
Multi-step problem: A math problem requiring the application of multiple distinct knowledge concepts (e.g., area formula + subtraction)
One-step problem: A math problem testing a single atomic knowledge concept