Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

📝 Paper Summary

Language Model Evaluation Causal Representation Learning Scaling Laws

The paper proposes Hierarchical Component Analysis (HCA) to discover a causal hierarchy of latent capabilities in LLMs—from general problem-solving to instruction-following to math reasoning—by leveraging performance heterogeneity across different base models.

Core Problem

Evaluating LLM capabilities is hindered by complex confounding effects (like base model heterogeneity) and interdependencies among skills, making it difficult to understand how specific capabilities causally influence downstream performance.

Why it matters:

Rigorous causal evaluation usually requires prohibitive costs (retraining models from scratch) to control for confounders.
Simple leaderboards rank models but fail to explain *why* performance improves or how skills like instruction-following enable complex reasoning.
Without understanding the causal structure of capabilities, it is unclear which skills to target during post-training to maximize downstream gains.

Concrete Example: Fine-tuning on instruction data might improve math problem-solving not because the model learned math, but because it learned to format the solution correctly. Standard evaluations cannot distinguish whether an intervention improved the core reasoning skill or just the upstream prerequisite of following instructions.

Key Novelty

Hierarchical Component Analysis (HCA) for Causal Capability Discovery

Models benchmark performance as a linear transformation of latent capability factors that are organized in a Directed Acyclic Graph (DAG).
Treats different base models (e.g., Llama-3 vs. Qwen2.5) as distinct 'domains' or 'views' to identify invariant causal structures, controlling for the base model as a common confounder.
Uses a novel algorithm (HCA) that combines Independent Component Analysis (ICA) with row-residual extraction to recover the hierarchical structure from observed performance matrices.

Architecture

Conceptual diagram of the hierarchical causal model. It shows 'Base Model' as a parent node influencing latent capabilities (A, B, C), which in turn influence Benchmark scores. Fine-tuning is modeled as an intervention.

Evaluation Highlights

Identified a stable 3-node causal hierarchy across 1500+ models: General Problem Solving → Instruction Following → Math Reasoning.
Found that performance on the 'General Problem Solving' latent factor correlates strongly with pre-training FLOPs (sigmoid scaling law), while 'Instruction Following' is more malleable via post-training.
Demonstrated that targeted fine-tuning on instruction-following (IFEval) causally improves math reasoning (MATH) performance across Llama-3 and Qwen2.5 families.

Breakthrough Assessment

8/10

Provides a rigorous theoretical framework for interpreting leaderboard data causally, moving beyond simple rankings to structural understanding of how LLM capabilities depend on each other.

⚙️ Technical Details

Problem Definition

Setting: Modeling observed benchmark scores x as a linear transformation of latent capability factors z (x = Gz), where z follows a structural causal model.

Inputs: Performance matrix X of N models across d benchmarks, partitioned by K distinct base model domains.

Outputs: Latent capability factors z, mixing matrix G, and the causal DAG structure describing relationships between factors.

Pipeline Flow

Data Partitioning (Group models by Base Model)
ICA Unmixing (Recover unmixing matrices Mk for each domain)
Row-Residual Extraction (Recover shared mixing matrix H)
Permutation Alignment (Align factors across domains)
Causal Graph Recovery (Estimate weights of the DAG)

System Modules

Data Partitioner

Identifies the base model for each leaderboard entry to create domain-specific datasets.

ICA Solver

Applies Independent Component Analysis to find the unmixing matrix that maps observed scores to independent sources for each domain.

Hierarchical Component Analyzer

Solves for the shared mixing matrix H and domain-specific lower-triangular matrices Bk to identify the causal DAG.

Novel Architectural Elements

Hierarchical Component Analysis (HCA) algorithm: specifically designed to identify causal factors when they are organized hierarchically (triangular adjacency matrix) and shared across heterogeneous domains.
Treatment of 'Base Model' as a confounding variable in a multi-domain causal representation learning framework.

Modeling

Base Model: Evaluated on families: Llama-3-8B, Llama-3.1-8B, Qwen2.5-7B, Qwen2.5-14B

Training Method: Supervised Fine-Tuning (SFT) for validation experiments

Adaptation: SFT on specific datasets (IFEval, BBH) to test causal hypotheses

Training Data:

Used Open LLM Leaderboard data (N=3360 models with known base models)
SFT validation used IFEval dataset with GPT-4 generated responses for ground truth

Key Hyperparameters:

learning_rate: 2e-5
epochs: 3

Comparison to Prior Work

vs. PCA-based Scaling Laws: HCA identifies *causal* directed relationships, whereas PCA only finds orthogonal correlation-based factors. PCA fails to account for base model heterogeneity.
vs. CRL [JS24]: HCA is robust to 'inexact' SCMs (where source variables might be slightly entangled) and provides a specific algorithm for hierarchical structures.
vs. IRT: HCA does not require hand-crafted modeling assumptions or full likelihoods, learning the structure directly from data heterogeneity.

Limitations

Analysis is restricted to a subset of base models (Llama-3, Qwen2.5 families) that share a similar Principal Component subspace.
Relies on the assumption that benchmark performance is a linear transformation of latent capabilities.
The interpretation of latent factors (z1, z2, z3) requires qualitative alignment with benchmark semantics, though supported by correlations.

Reproducibility

Code: https://github.com/hlzhang109/causal-eval

Code available at https://github.com/hlzhang109/causal-eval. Relies on public Open LLM Leaderboard data. Validation SFT experiments use standard hyperparameters.

📊 Experiments & Results

Evaluation Setup

Analysis of Open LLM Leaderboard data (v2) containing 3360 models across 6 benchmarks.

Benchmarks:

IFEval (Instruction Following)
BBH (Big Bench Hard (Reasoning))
MATH Lvl 5 (Hard Math Problems)
GPQA (Graduate-Level Reasoning)
MUSR (Multistep Soft Reasoning)
MMLU-PRO (General Knowledge & Reasoning)

Metrics:

Accuracy
R^2 (for OLS fit of factors to benchmarks)
Maximum Inexactness Coefficient (MIC)

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
The recovered latent factors (z1, z2, z3) align strongly with specific benchmark categories.
BBH	R^2	0	0.94	0.94
IFEval	R^2	0	0.90	0.90
MATH Lvl 5	R^2	0	1.00	1.00
Recovered Causal Structure shows specific edge weights between factors for different base models.
Latent Graph	Edge Weight (z1 -> z2)	Not reported in the paper	2.76	Not reported in the paper
Latent Graph	Edge Weight (z2 -> z3)	Not reported in the paper	0.27	Not reported in the paper
Intervention experiments validate the causal link z2 (Instruction Following) -> z3 (Math).
MATH	Accuracy	0.29	0.32	+0.03

Experiment Figures

Recovered causal graphs (DAGs) for four specific base models (Llama-3-8B, Llama-3.1-8B, Qwen2.5-7B, Qwen2.5-14B) with edge weights.

Unmixing matrix heatmaps and regression plots of latent factors against benchmarks.

Main Takeaways

Capabilities follow a hierarchy: General Reasoning (z1) -> Instruction Following (z2) -> Math (z3).
Base models introduce significant heterogeneity; causal discovery requires controlling for this using multi-domain analysis.
Factor z1 (General Reasoning) is heavily determined by pre-training compute (follows scaling laws) and is hard to improve via simple fine-tuning.
Improvements in Math (z3) observed in instruction-tuned models are often causal downstream effects of better instruction following (z2), not just better reasoning.

📚 Prerequisite Knowledge

Prerequisites

Structural Causal Models (SCM)
Independent Component Analysis (ICA)
Linear Algebra (PCA, Matrix Factorization)
LLM Post-training (SFT, RLHF)

Key Terms

HCA: Hierarchical Component Analysis—the proposed algorithm to recover latent causal factors and their hierarchical structure from multi-domain data.

SCM: Structural Causal Model—a framework where variables are generated by structural equations representing causal mechanisms (e.g., z2 = w*z1 + noise).

ICA: Independent Component Analysis—a computational method for separating a multivariate signal into additive subcomponents.

MIC: Maximum Inexactness Coefficient—a measure quantifying how much the recovered source variables violate the independence assumption in an inexact SCM.

Latent Capability Factors: Unobserved variables (z1, z2, z3) representing abstract skills (e.g., 'Instruction Following') that generate observed benchmark scores.

Base Model Domain: A group of models derived from the same pre-trained checkpoint (e.g., all Llama-3-8B fine-tunes), treated as a distinct environment for causal learning.