The Geometry of Reasoning: Flowing Logics in Representation Space

📝 Paper Summary

Mechanistic Interpretability Reasoning Dynamics Representation Geometry

The paper models LLM reasoning as continuous geometric flows where logical structure acts as a differential controller governing the velocity and curvature of semantic trajectories, independent of surface content.

Core Problem

Current interpretations of LLMs often view reasoning as discrete token generation or random graph walks, failing to explain how models internalize deep logical structure independent of surface semantics.

Why it matters:

Challenging the 'stochastic parrot' view is essential to determine if LLMs genuinely understand logic or merely mimic surface forms
Lack of rigorous geometric frameworks limits our ability to quantify, steer, and ensure the safety of reasoning processes in latent space
Understanding the distinction between semantic content (position) and logical structure (velocity/curvature) is critical for robust interpretability

Concrete Example: A logical deduction (e.g., Modus Ponens) applied to 'weather' vs. 'sports' produces completely different raw embeddings. Using only position similarity suggests they are unrelated, missing the shared underlying logical 'movement' that a geometric flow analysis reveals.

Key Novelty

Reasoning Flow Framework

Models reasoning as a cumulative trajectory (flow) on a concept manifold, where local velocity and curvature are dictated by logical rules rather than semantic topics
Proposes that logic functions as a 'steering wheel' (differential constraint) that determines the turning and speed of the reasoning path, while semantic content determines the location

Architecture

Schematic of the mapping relationships between Input Space X, Concept Space C, Logic Space L, and Representation Space R.

Evaluation Highlights

Logic similarity in Qwen3-0.6B increases from 0.26 (position) to 0.53 (curvature), showing logic is encoded in higher-order geometry
Random shuffling of logical steps collapses curvature similarity to 0.02, proving the trajectory's order is structurally significant
Consistent geometric patterns observed across model families (Qwen, LLaMA) and scales (0.5B to 8B), suggesting a universal representational law

Breakthrough Assessment

8/10

Offers a strong theoretical formalization of reasoning as geometry, backed by empirical evidence that successfully disentangles logic from semantics. Provides a new lens for interpretability beyond static features.

⚙️ Technical Details

Problem Definition

Setting: Analyzing the geometric properties of context-cumulative representation trajectories generated by LLMs on controlled logical tasks

Inputs: Prompt P and reasoning chain X = (x_1, ..., x_T)

Outputs: Sequence of embeddings Y = (y_1, ..., y_T) analyzed for Velocity and Menger Curvature

Pipeline Flow

Dataset Generation: GPT-5 creates logic templates + semantic carriers
Inference: LLM processes sequences step-by-step
Extraction: Hidden states extracted via Representation Operator
Geometric Analysis: Compute Velocity and Curvature similarities

System Modules

Data Generator

Generate parallel reasoning tasks sharing logical skeletons but differing in topic/language

Model or implementation: GPT-5

Representation Extractor

Extract context-dependent hidden states for each reasoning step

Model or implementation: Qwen3 / LLaMA3 (target models)

Geometric Analyzer

Compute differential geometric quantities on the extracted trajectories

Model or implementation: Mathematical functions (Velocity, Menger Curvature)

Modeling

Base Model: Qwen3 (0.6B, 1.7B, 4B), Qwen1.5/2 (0.5B), LLaMA3 (8B)

Comparison to Prior Work

vs. Linear Representation Hypothesis: Extends static linearity to dynamic flows on manifolds, incorporating curvature
vs. Graph-based Analysis: Models reasoning as continuous smooth trajectories governed by differential constraints rather than discrete graph hops

Limitations

Focuses exclusively on natural language understanding (NLU), not generation quality
Relies on synthetic formal logic datasets, may not fully transfer to messy real-world reasoning
Assumes existence of a smooth manifold underlying discrete tokens (supported by construction but theoretical)

Reproducibility

Code: https://github.com/MasterZhou1/Reasoning-Flow

Code available at https://github.com/MasterZhou1/Reasoning-Flow. Dataset available on Hugging Face. Full prompts for data generation provided in Appendix.

📊 Experiments & Results

Evaluation Setup

Analyze hidden state trajectories of 30 logical structures across 20 topics and 4 languages (en, zh, de, ja)

Benchmarks:

Reasoning-Flow Dataset (Formal Logic Reasoning (Natural Deduction)) [New]

Metrics:

Position Similarity (Cosine)
Velocity Similarity (Cosine of differences)
Curvature Similarity (Pearson correlation)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Comparison of geometric similarities across Qwen3-0.6B confirms that Position is dominated by semantics (Topic), while Curvature is dominated by Logic.
Reasoning-Flow	Position Similarity (Logic Group)	0.30	0.26	-0.04
Reasoning-Flow	Velocity Similarity (Logic Group)	0.07	0.17	+0.10
Reasoning-Flow	Curvature Similarity (Logic Group)	0.11	0.53	+0.42
Random Shuffle baseline demonstrates that the geometric structure is order-dependent and not a bag-of-words property.
Reasoning-Flow	Curvature Similarity (Logic Group)	0.53	0.02	-0.51
Model scaling results show consistency across sizes.
Reasoning-Flow	Curvature Similarity (Logic Group)	0.53	0.53	0.00

Experiment Figures

Heatmaps of similarity matrices for Position, Velocity, and Curvature on Qwen3 0.6B across 5 logic templates (A-E) instantiated with different topics.

Main Takeaways

Logical structure is encoded in the higher-order geometry (velocity and curvature) of representation space, not the raw positions.
Position embeddings are dominated by surface semantics (topics/languages), clustering by content rather than reasoning form.
The 'Random Shuffle' experiment confirms that reasoning flows are dynamic processes dependent on order, not static feature sets.
Geometric patterns of reasoning appear universal across different model families (Qwen, LLaMA) and sizes, suggesting a fundamental representational law.

📚 Prerequisite Knowledge

Prerequisites

Differential geometry (curves, tangents, curvature)
Linear Representation Hypothesis
Formal logic (Natural Deduction)

Key Terms

Menger Curvature: A geometric metric defined by three points that quantifies how much a curve deviates from a straight line (inverse of the radius of the circumcircle)

Natural Deduction: A logic system where reasoning proceeds via inference rules (like 'Introduction' and 'Elimination' of connectives) rather than axioms

Reasoning Flow: The trajectory of embeddings created by accumulating context step-by-step during a chain-of-thought process

Semantic Carrier: The surface content (e.g., topic, language) that 'carries' the abstract logical structure

Context Cumulative Flow: A sequence of embeddings where each step includes the prompt and all previous reasoning steps

Representation Operator: The mapping function (e.g., an LLM's encoder or specific layer) that converts discrete tokens into continuous vector embeddings