Correlation or Causation: Analyzing the Causal Structures of LLM and LRM Reasoning Process

📝 Paper Summary

Causal Analysis of LLMs Reasoning Models (LRMs) Chain-of-Thought (CoT) Reliability

By modeling reasoning as a causal graph, this study reveals that Reinforcement Learning with Verifiable Rewards (RLVR) enables models to develop genuine causal reasoning chains, whereas distillation merely mimics superficial correlations.

Core Problem

LLM reasoning (Chain-of-Thought) often lacks faithfulness, acting as a post-hoc explanation for a latent belief rather than a genuine derivation of the answer.

Why it matters:

Superficial correlations lead to unfaithfulness, where correct reasoning steps can produce incorrect answers and vice-versa
Without genuine causal structure, models fail to generalize and remain prone to bias and inconsistency
The impact of new training paradigms like RLVR and Distillation on the underlying causal mechanism of reasoning was previously unexplored

Concrete Example: In a 'Common Cause' (Type II) failure, a model might decide the answer is '42' based on the instruction context alone (the latent belief) and then generate a reasoning chain just to justify that number, rather than calculating '42' through the reasoning steps. If the reasoning is perturbed, the answer '42' remains unchanged, indicating the reasoning did not cause the answer.

Key Novelty

Causal Analysis Framework for Large Reasoning Models (LRMs)

Models the reasoning process using Structural Causal Models (SCMs) with four variables: Instruction, Thinking (internal CoT), Reasoning Steps (explicit CoT), and Answer
Uses interventional experiments (Average Treatment Effect) to determine which variable actually dictates the answer, distinguishing between genuine reasoning (Causal Chain) and mere explanation (Common Cause)

Architecture

The Causal Analysis Framework showing the variables and potential causal links in the reasoning process.

Breakthrough Assessment

8/10

Provides a fundamental theoretical framework for understanding *why* RLVR works better than distillation for reasoning, shifting the focus from accuracy metrics to causal structural integrity.

⚙️ Technical Details

Problem Definition

Setting: Causal structure identification of the reasoning process in autoregressive language models

Inputs: Reasoning tasks (Instruction Z) requiring step-by-step derivation

Outputs: Answer Y, mediated by Thinking T and/or Reasoning Steps X

Pipeline Flow

Variable Definition (Z, T, X, Y)
Intervention (Bias Injection / Randomization)
Response Generation
Causal Structure Identification (SCM Classification)

System Modules

Variable Definition

Define the four random variables for analysis: Instruction (Z), Thinking (T), CoT (X), Answer (Y)

Model or implementation: Conceptual Framework

Intervention Mechanism

Apply specific interventions to test causal hypotheses

Model or implementation: Algorithm

Structure Classifier

Determine the SCM type based on statistical significance of interventions

Model or implementation: Statistical Test (McNemar’s test)

Novel Architectural Elements

Inclusion of the 'Thinking' (T) variable in the causal graph to analyze the internal reasoning process of LRMs (like o1/DeepSeek-R1) distinct from explicit CoT

Modeling

Base Model: Various (DeepSeek-R1, QwQ-32B, GPT-4, Llama2-70B, etc.)

Training Method: Analysis compares multiple existing post-training methods: RLVR, Distillation, SFT, RLHF

Adaptation: None (Analysis paper)

Trainable Parameters: None (Analysis paper)

Training Data:

Mathematical Reasoning: GSM8k, MATH500, 3-digit multiplication, 9-digit addition
Logical Reasoning: ProofWriter (600 samples), FOLIO (204 samples), LOGIQA

Compute: Not reported in the paper

Comparison to Prior Work

vs. Causal Benchmarks (CLadder): This paper analyzes the causality of the *model's own reasoning process*, rather than the model's ability to answer causal questions
vs. Standard CoT Evaluation: Focuses on causal structure (SCM) rather than just accuracy or surface-level consistency
vs. Feature Attribution Methods [not cited in paper]: Uses interventional ATE on semantic variables (CoT, Instruction) rather than token-level gradients or attention weights

Limitations

Analysis relies on negative interventions (breaking things) which might push models out of distribution
Thinking process (T) in API-based models (like o1) may be inaccessible for direct intervention
The 'Ideal' Causal Chain assumption simplifies the complex interplay of attention heads in Transformers

Reproducibility

Code: https://github.com/Harryking1999/CoT_Causal_Analysis

Code and data released at https://github.com/Harryking1999/CoT_Causal_Analysis. Models used are largely open-weights (DeepSeek, Qwen, Llama) or available via API (GPT-4). Specific datasets (GSM8k, etc.) are standard public benchmarks.

📊 Experiments & Results

Evaluation Setup

Intervention-based causal analysis on mathematical and logical reasoning tasks

Benchmarks:

GSM8k (Math Word Problems)
MATH500 (Challenging Math Problems)
ProofWriter (Logical Deduction)
FOLIO (First-Order Logic)

Metrics:

Average Treatment Effect (ATE)
Relative ATE (R-ATE)
SCM Type Classification (I, II, III, IV)
Statistical methodology: McNemar’s test for assessing significance of causal links

Experiment Figures

The four prototype Structural Causal Models (SCMs) identified in reasoning processes.

Main Takeaways

RLVR (Reinforcement Learning with Verifiable Rewards) significantly enhances causal reasoning, aligning models with the ideal 'Causal Chain' (Type I) structure where reasoning steps actually determine the answer.
Distillation fails to improve causal structure; distilled LRMs exhibit similar causal deficiencies (superficial correlations) as standard LLMs, despite having high task accuracy.
Traditional post-training techniques like Instruction Tuning and RLHF tend to weaken causal structures, moving models away from genuine reasoning.
A high correlation exists between the reduction of spurious features during RLVR training and the emergence of robust causal structures.
LLMs generally default to Type II (Common Cause) or Type III (Full Connection) structures, meaning their CoT is often a post-hoc justification rather than the computational path to the answer.

📚 Prerequisite Knowledge

Prerequisites

Structural Causal Models (SCM)
Chain-of-Thought (CoT) Prompting
Reinforcement Learning (RL) for LLMs
Causal Inference (Interventions/Do-calculus)

Key Terms

RLVR: Reinforcement Learning with Verifiable Rewards—a training method where models are rewarded for correct final answers in deterministic tasks (e.g., math), encouraging correct reasoning paths

SCM: Structural Causal Model—a framework representing causal relationships between variables using directed acyclic graphs

CoT: Chain-of-Thought—intermediate reasoning steps generated by a model before the final answer

LRM: Large Reasoning Model—models specifically trained (often via RL) to generate extensive internal or external thinking processes (e.g., o1, DeepSeek-R1)

ATE: Average Treatment Effect—a metric measuring the change in an outcome variable (Answer) when an intervention is applied to a treatment variable (e.g., CoT)

Causal Chain: Type I SCM structure (Instruction → Thinking → CoT → Answer) representing ideal, faithful reasoning where steps determine the result

Common Cause: Type II SCM structure where Instruction determines both CoT and Answer independently; the CoT explains the answer but does not cause it

Distillation: Training a smaller student model using the outputs (reasoning traces) of a larger teacher model

Thinking: A specific variable in LRMs representing the implicit or explicit long-context exploration and reflection process before the final response

ICL: In-Context Learning—providing examples within the prompt to guide model behavior without weight updates