Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers

📝 Paper Summary

Sparse Mixture-of-Experts (MoE) Mechanistic Interpretability Internal Model Representations

Routing signatures—vectors summarizing expert activation patterns—reveal that sparse MoE transformers systematically route tokens to different experts based on task category, enabling high-accuracy task classification solely from routing telemetry.

Core Problem

The internal routing behavior of sparse MoE models is poorly understood, often treated merely as a load-balancing mechanism rather than a meaningful signal of how computation is allocated across tasks.

Why it matters:

Routing is central to interpretability: if tasks use distinct experts, routing offers a tractable view into modular computation
Debugging: abnormal routing may signal expert collapse or drift in deployed systems
Scientific understanding: determining whether sparse models implement different computation pathways for different tasks is key to understanding neural modularity

Concrete Example: A code-generation prompt might activate a specific set of experts in deep layers, while a creative story prompt activates a different set. Current analysis treats these as random or balanced noise, missing the structural connection between the task type and the physical experts chosen.

Key Novelty

Routing Signatures for Task Analysis

Introduces 'routing signatures': compact vector representations that summarize the frequency of expert usage across all layers for a specific prompt
Demonstrates that these signatures are not random but cluster strongly by task, exceeding what load-balancing alone would predict
Shows that simple linear classifiers can predict the task type (e.g., Code vs. Math) with >92% accuracy using only these routing patterns

Evaluation Highlights

Within-category routing similarity (0.8435) significantly exceeds across-category similarity (0.6225), confirming strong task clustering
A logistic regression classifier achieves 92.5% accuracy in 4-way task classification using only routing signatures
Task separation peaks in deeper layers (around layer 13), suggesting routing specialization increases with network depth

Breakthrough Assessment

7/10

Provides compelling empirical evidence that MoE routing is semantic and task-conditioned, not just a load-balancing artifact. The methodology is simple but the insight is fundamental for MoE interpretability.

⚙️ Technical Details

Problem Definition

Setting: Analyzing the expert selection distribution of a pre-trained Sparse MoE model given diverse prompts

Inputs: Prompt x belonging to a specific task category (Code, Math, Story, Factual)

Outputs: Routing signature S(x) (vector of normalized expert activation counts across layers)

Pipeline Flow

Inference (Run prompt through MoE model)
Telemetry Extraction (Record active experts per token/layer)
Signature Computation (Aggregate counts into normalized vectors)
Analysis (Compute similarity, classify tasks)

System Modules

OLMoE-1B-7B-0125-Instruct

Generate tokens and routing decisions

Model or implementation: OLMoE-1B-7B-0125-Instruct (16 layers, 64 experts/layer, top-8 routing)

Signature Extractor (Analysis)

Convert raw traces into fixed-size routing signatures

Model or implementation: Deterministic aggregation

Task Classifier (Analysis)

Predict task category from routing signature

Model or implementation: Logistic Regression

Novel Architectural Elements

Routing Signatures: A specific vector formulation for aggregating and analyzing MoE routing decisions across layers

Modeling

Base Model: OLMoE-1B-7B-0125-Instruct

Training Method: Logistic Regression (for the analysis classifier only)

Adaptation: None (The MoE model is frozen; only the probe classifier is trained)

Trainable Parameters: Weights of the logistic regression classifier (1024 -> 4 classes)

Training Data:

80 prompts total
4 categories: Code, Math, Story, Factual
32 generated tokens per prompt

Key Hyperparameters:

k_experts: 8 (top-k routing)
total_experts: 64
layers: 16

Compute: Inference on commodity hardware (OLMoE-1B is relatively small)

Comparison to Prior Work

vs. Switch Transformers/GShard: Focuses on task-conditioned structure of routing (interpretability) rather than training stability or balancing losses
vs. Standard MoE Analysis: Uses 'routing signatures' to quantify prompt-level similarity, whereas most prior work looks at aggregate dataset-level statistics

Limitations

Evaluated on only one model (OLMoE-1B-7B) and a small dataset (80 prompts)
Analysis is correlational; does not intervene on experts to prove causal necessity
Short generation length (32 tokens) limits analysis of long-horizon dynamics

Reproducibility

MoE-Xray toolkit released (URL not explicitly in text). Model OLMoE is open weights. Dataset of 80 prompts described but not explicitly linked as a file.

📊 Experiments & Results

Evaluation Setup

Inference on 80 prompts across 4 categories, collecting expert activation traces

Benchmarks:

Custom Prompt Set (Multi-task (Code, Math, Story, Factual)) [New]

Metrics:

Cosine Similarity (between routing signatures)
Classification Accuracy (predicting task from signature)
Cohen's d (effect size of separation)
Statistical methodology: 5-fold stratified cross-validation for classifier; Mean/Std dev reporting for similarity

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Routing similarity analysis shows that prompts within the same task category activate much more similar experts than prompts across different categories, and this effect exceeds random baselines.
Custom Prompt Set	Cosine Similarity	0.6225	0.8435	+0.2210
Custom Prompt Set	Cohen's d	0	1.44	+1.44
Custom Prompt Set	Accuracy	25.0	92.5	+67.5
Custom Prompt Set	Macro F1	0.25	0.93	+0.68

Experiment Figures

Heatmap of routing signature similarity across 4 task categories

Layer-wise effect size (Cohen's d) of task separation

PCA projection of routing signatures

Main Takeaways

Routing is task-conditioned: Prompts from the same category induce highly similar expert activations.
Structure exceeds balancing: The observed clustering is stronger than what is predicted by a load-balancing baseline (which preserves layer-wise counts but randomizes selection).
Depth matters: Task separation in routing signatures increases in deeper layers, peaking around layer 13, suggesting specialization emerges with network depth.
Linearly separable: Task identity can be recovered from routing patterns using simple linear classifiers, proving the signal is robust and accessible.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Sparse Mixture-of-Experts (MoE) architecture
Familiarity with Transformer feed-forward blocks
Basic linear algebra (cosine similarity, PCA)

Key Terms

Routing Signature: A vector summarizing how often each expert is activated across all layers for a given prompt, normalized per layer

MoE: Mixture-of-Experts—a neural architecture where only a subset of parameters (experts) are applied to each input token

Top-k routing: A strategy where the router selects the 'k' experts with the highest probability scores for each token

Cohen's d: A statistical effect size measure indicating the standardized difference between two means

Conditional Computation: The paradigm where the model activates different parts of its network for different inputs

Load balancing: The requirement that experts be chosen roughly equally often to prevent computational bottlenecks or 'expert collapse'

OLMoE: A specific open-weights Mixture-of-Experts language model used as the testbed in this paper