RLVR: Reinforcement Learning with Verifiable Rewards—a training method where models are rewarded for correct final answers in deterministic tasks (e.g., math), encouraging correct reasoning paths
SCM: Structural Causal Model—a framework representing causal relationships between variables using directed acyclic graphs
CoT: Chain-of-Thought—intermediate reasoning steps generated by a model before the final answer
LRM: Large Reasoning Model—models specifically trained (often via RL) to generate extensive internal or external thinking processes (e.g., o1, DeepSeek-R1)
ATE: Average Treatment Effect—a metric measuring the change in an outcome variable (Answer) when an intervention is applied to a treatment variable (e.g., CoT)
Causal Chain: Type I SCM structure (Instruction → Thinking → CoT → Answer) representing ideal, faithful reasoning where steps determine the result
Common Cause: Type II SCM structure where Instruction determines both CoT and Answer independently; the CoT explains the answer but does not cause it
Distillation: Training a smaller student model using the outputs (reasoning traces) of a larger teacher model
Thinking: A specific variable in LRMs representing the implicit or explicit long-context exploration and reflection process before the final response
ICL: In-Context Learning—providing examples within the prompt to guide model behavior without weight updates