Spatial-to-Relational Transformation: Converting grid coordinates (e.g., 1,0) into abstract node labels (e.g., Node F) and explicit connectivity lists to remove geometric bias.
Context Inconsistency Hallucination: Errors occurring during long reasoning chains where the model contradicts its previous context or loses track of the current state.
Spatial Hallucination: The tendency of LLMs to misunderstand spatial relationships, often assuming connectivity based on coordinate similarity rather than actual map structure.
Reverse Curriculum Learning: A learning strategy that starts training with tasks close to the goal (easy) and iteratively moves the starting point further away (harder).
Q-learning: A model-free reinforcement learning algorithm that learns the value of an action in a particular state.
Experience Replay Buffer: A memory mechanism that stores past experiences (state, action, reward, next state) to stabilize training by reusing them.
Epsilon-greedy: A policy where the agent chooses the best-known action most of the time but explores random actions with probability epsilon.