Evaluation Setup
Theoretical analysis and algorithmic calculation of depth bounds for specific architectures
Benchmarks:
- Gemma 3 Family (Architecture Analysis)
- Mixture-of-Experts vs Dense (Architecture Analysis)
Metrics:
- Opaque Serial Depth (Numeric Upper Bound)
- Asymptotic Depth Complexity
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Manual calculations of opaque serial depth for Gemma 3 models show how depth scales with model size. |
| Gemma 3 1B |
Opaque Serial Depth |
Not reported in the paper |
124 |
Not reported in the paper
|
| Gemma 3 4B |
Opaque Serial Depth |
Not reported in the paper |
208 |
Not reported in the paper
|
| Gemma 3 12B |
Opaque Serial Depth |
Not reported in the paper |
280 |
Not reported in the paper
|
| Gemma 3 27B |
Opaque Serial Depth |
Not reported in the paper |
376 |
Not reported in the paper
|
Main Takeaways
- Standard Transformers have limited opaque serial depth O(L(log T + log D)), supporting the hypothesis that they *must* use Chain of Thought for hard serial tasks.
- Recurrent architectures (RNNs) allow serial depth to grow with sequence length O((L+T) log D), potentially allowing them to hide reasoning and bypass CoT monitoring.
- Mixture-of-Experts models likely have lower opaque serial depth than equivalent dense models because they activate fewer parameters/paths per token.
- The definition of 'interpretable' is crucial: treating continuous latent states as uninterpretable drastically increases the opaque serial depth of such architectures.