Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

📝 Paper Summary

Mechanistic Interpretability Time Series Forecasting

Applying sparse autoencoders to the Chronos-T5 model reveals that mid-layer change-detection features are causally critical for forecasting, whereas final-layer semantic features are redundant or even detrimental.

Core Problem

Time Series Foundation Models (TSFMs) are increasingly deployed in high-stakes domains, yet their internal decision-making processes remain opaque 'black boxes'.

Why it matters:

Lack of transparency in high-stakes forecasting (e.g., energy, finance) creates risk
Prior interpretability methods for time series were post-hoc (saliency maps) rather than mechanistic, failing to explain internal computation
It is unknown whether TSFMs rely on robust causal features or spurious correlations

Concrete Example: When predicting a time series with a sudden level shift, it is unclear if the model detects the shift explicitly or relies on periodic patterns. This study shows that ablating a single mid-layer feature (ID 4616) causes a massive forecast error (CRPS spike of 38.61), proving the model's 'catastrophic dependence' on specific internal change-detection circuits.

Key Novelty

First application of Sparse Autoencoders (SAEs) to a Time Series Foundation Model

Train TopK Sparse Autoencoders on the internal activations of Chronos-T5 to decompose dense representations into interpretable features
Map these features to a taxonomy of temporal concepts (trends, seasonality, level shifts) using synthetic data correlations
Validate feature importance via causal ablation (zeroing out features) to measure their direct impact on forecast accuracy (CRPS)

Evaluation Highlights

100% of 392 ablated features produced positive CRPS degradation, confirming universal causal relevance of SAE features
Mid-encoder (Block 11) features are the most critical, with a single feature causing a +38.61 CRPS degradation (max-to-median ratio of 30.5x)
Final-encoder (Block 23) progressive ablation paradoxically improves forecast quality (CRPS decreases from 3.62 to 2.73), suggesting overfitting or redundancy

Breakthrough Assessment

8/10

Significant methodology transfer (SAEs to Time Series) with counterintuitive findings about layer hierarchy (mid-layer bottleneck). Strong causal validation, though limited to one model and benchmark.

⚙️ Technical Details

Problem Definition

Setting: Probabilistic univariate time series forecasting using a quantized language model architecture

Inputs: Univariate time series values normalized and mapped to V=4,096 discrete bins

Outputs: Probabilistic forecast distribution (samples)

Pipeline Flow

Time Series Tokenization
Chronos-T5 Encoder
Chronos-T5 Decoder
Forecast Sampling

System Modules

Tokenizer

Quantize continuous values into discrete tokens

Model or implementation: Quantization (4096 bins)

Chronos-T5 Encoder

Process historical time series context

Model or implementation: T5-Large Encoder (24 layers, d_model=1024)

Sparse Autoencoder (Probe)

Decompose residual stream activations into interpretable features

Model or implementation: TopK SAE (d_sae=8192, k=64)

Chronos-T5 Decoder

Generate future forecast tokens autoregressively

Model or implementation: T5-Large Decoder (24 layers)

Modeling

Base Model: Chronos-T5-Large (710M parameters)

Training Method: Training Sparse Autoencoders on frozen model activations

Objective Functions:

Purpose: Reconstruct the dense activation vector from sparse features.

Formally: MSE Reconstruction Loss between original activation x and reconstructed x'
Purpose: Enforce sparsity in the feature representation.

Formally: TopK activation function (keep only k=64 largest values)

Key Hyperparameters:

d_sae: 8192
k: 64
learning_rate: 3e-4
+ 2 more
steps: 50,000
batch_size: 2,048

Compute: SAE trained on ~100,000 time series windows

Limitations

Taxonomy classifier is heuristic; >80% of features remain unlabeled
Analysis restricted to a single model (Chronos-T5-Large) and benchmark (ETT)
Ablation experiments use a limited configuration (256 windows, 4 samples) for speed
Low coverage of decoder features (<6% labeled) compared to encoder

Reproducibility

No replication artifacts mentioned in the paper (code URL not provided). Uses standard ETT benchmark and Chronos-T5-Large (publicly available). SAE training details provided.

📊 Experiments & Results

Evaluation Setup

Causal ablation study on ETT benchmark forecasting tasks

Benchmarks:

ETT (Multivariate Time Series Forecasting (treated as univariate))
Synthetic Diagnostic Suite (Feature Taxonomy Validation) [New]

Metrics:

CRPS (Continuous Ranked Probability Score)
Delta CRPS (Change in CRPS after ablation)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Single-feature ablation experiments demonstrate the heavy-tailed distribution of feature importance in the mid-encoder layers.
ETT	Max Delta CRPS	0	38.61	+38.61
ETT	Max-to-Median Impact Ratio	1.0	30.5	+29.5
Progressive ablation reveals a paradoxical improvement in the final encoder layer, contrasting with catastrophic failure in earlier layers.
ETT	CRPS	2.61	25.32	+22.71
ETT	CRPS	3.62	2.73	-0.89

Experiment Figures

Progressive ablation curves (CRPS vs Number of Ablated Features) for Encoder Blocks 5, 11, and 23

Main Takeaways

Mid-encoder layers (Block 11) act as a 'causal bottleneck', containing the most critical features for forecasting accuracy, specifically those related to level shifts and noise.
Final encoder layers (Block 23) contain rich semantic features (seasonality, frequency) that are less causally relevant or even redundant; removing them can improve performance on specific benchmarks like ETT.
Causal importance follows a power law in middle layers, where a tiny fraction of features carries the vast majority of the forecasting capability.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (Encoder-Decoder)
Sparse Autoencoders (SAEs)
Time Series Forecasting metrics (CRPS)

Key Terms

TSFM: Time Series Foundation Model—large-scale pre-trained models adapted for forecasting tasks

SAE: Sparse Autoencoder—an unsupervised network trained to decompose dense model activations into a sparse, interpretable set of features

CRPS: Continuous Ranked Probability Score—a metric for probabilistic forecasting that measures how closely the predicted distribution matches the true observation (lower is better)

TopK SAE: A variant of SAE that enforces sparsity by keeping only the K highest activating latent features and zeroing the rest

Ablation: The process of removing or zeroing out specific components (features) of a model to assess their contribution to performance

Residual Stream: The primary vector pathway in a Transformer where layers add their outputs, serving as the main carrier of information

Chronos-T5: A specific TSFM based on the T5 language model architecture that treats time series values as quantized tokens

Level Shift: A sudden, sustained change in the mean value of a time series

ETT: Electricity Transformer Temperature—a standard benchmark dataset for time series forecasting