← Back to Paper List

Characterizing the Expressivity of Fixed-Precision Transformer Language Models

Unknown authors
ETH Zurich
OpenReview
Reasoning Benchmark

📝 Paper Summary

Theoretical expressivity of neural networks Formal language theory
Fixed-precision transformers with soft attention and no positional encodings are exactly as expressive as first-order logic with two variables and past operators, corresponding to star-free languages recognized by partially ordered DFAs.
Core Problem
The theoretical expressive power of transformers is often analyzed under unrealistic idealizations (arbitrary precision, unique hard attention) that overestimate their capabilities compared to practical fixed-precision implementations.
Why it matters:
  • Current theory suggests transformers can recognize complex languages (e.g., Dyck languages) that practical models fail to learn, creating a gap between theory and practice
  • Understanding exact limits is crucial for knowing which tasks transformers fundamentally cannot solve regardless of data size or training time
Concrete Example: Theoretical models with arbitrary precision can recognize Dyck languages (balanced parentheses). However, the paper shows that a fixed-precision transformer fundamentally cannot recognize 'unambiguous polynomials' like RDP-1, failing to distinguish states that are not partially ordered.
Key Novelty
Exact logical characterization of fixed-precision soft-attention transformers
  • Establishes an equivalence between these transformers and LTL[P] (Linear Temporal Logic with only past operators)
  • Connects this logic to algebraic and automata-theoretic classes: R-trivial monoids and Partially Ordered DFAs (PODFAs)
  • Proves that soft attention under fixed precision has a bounded attention span, limiting it to 'local' decisions similar to moving a window over the input
Evaluation Highlights
  • Transformers achieve 100% accuracy on all languages within the LTL[P] class (e.g., LDP-1, LDP-2)
  • Transformers consistently fail (max 90.0% accuracy, mean 71.1%) on RDP-1, a language just outside the characterized class, despite it being star-free
  • Empirical probing confirms transformers learn linearly separable states for LTL[P] languages but fail to separate states for non-LTL[P] languages
Breakthrough Assessment
9/10
Provides the first exact characterization of transformers under the realistic 'fixed-precision soft-attention' assumption, closing a major gap between theoretical upper bounds and empirical reality.
×