← Back to Paper List

Constructing the Umwelt: Cognitive Planning through Belief-Intent Co-Evolution

Shiyao Sang
arXiv (2025)
MM Agent Reasoning Benchmark

📝 Paper Summary

End-to-End Autonomous Driving World Models Cognitive AI
TIWM replaces dense world reconstruction with a sparse, subjective 'Umwelt' of intent tokens that co-evolve with belief states, aligning internal representations with physical affordances rather than pixel fidelity.
Core Problem
Current end-to-end planners rely on dense reconstruction of the environment, which introduces computational redundancy and 'cognitive myopia'—focusing on short-term fidelity rather than long-term causal reasoning.
Why it matters:
  • Reconstruction is wasteful: Modeling task-irrelevant details (like pixel-level textures) introduces noise and high entropy unrelated to driving decisions
  • Density is biologically implausible: Biological systems optimize for sparse, task-relevant invariants (Umwelt) rather than objective reality (Umgebung)
  • Language models lack embodiment: LLMs describe the world but lack the proprioceptive sensory-motor closed loop required for real-time physical interaction
Concrete Example: A standard reconstruction-based model might waste resources predicting the texture of a sidewalk or parked cars irrelevant to the path. TIWM, like a human driver, ignores these to focus solely on 'affordances' like the gap for a lane change or an intersection requiring a yield.
Key Novelty
Tokenized Intent World Model (TIWM)
  • Constructs a subjective internal world (Umwelt) using sparse 'Intent Tokens' rather than reconstructing the objective visual scene
  • Belief-Intent Co-Evolution: Future intents are predicted from current beliefs, while gradients from task-specific planning loss retroactively shape the belief encoder to focus only on relevant features
  • Cognitive Consistency: Implicitly aligns internal attention with physical world affordances without requiring explicit auxiliary reconstruction losses
Architecture
Architecture Figure Figure 2 (implied from text)
The TIWM architecture illustrating the transition from dense BEV inputs to sparse Belief/Intent tokens and final trajectory decoding
Breakthrough Assessment
7/10
Proposes a strong philosophical shift from reconstruction to 'active understanding' via sparse tokens. While the theoretical grounding in cognitive science is novel for E2E driving, the provided text lacks the quantitative results to confirm SOTA performance.
×