← Back to Paper List

Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models

Yao Yao, Zuchao Li, Hai Zhao
Department of Computer Science and Engineering, Shanghai Jiao Tong University, School of Computer Science, Wuhan University
arXiv (2023)
MM Reasoning KG

📝 Paper Summary

Prompting Strategies Reasoning in LLMs Multimodal Reasoning
Graph-of-Thought models reasoning as a non-linear graph of connected ideas rather than a linear chain, fusing graph-encoded structural information with text and visual features for improved question answering.
Core Problem
Human thought is non-linear and jumping, but current Chain-of-Thought (CoT) approaches force LLMs into strict sequential reasoning, losing complex structural connections between ideas.
Why it matters:
  • Sequential chains fail to capture 'leaps of thought' where seemingly unrelated ideas connect to form solutions
  • Existing methods neglect the complex structural information inherent in reasoning (e.g., multiple premises leading to one conclusion)
  • Current multimodal approaches often treat reasoning linearly, missing the graph-like nature of human cognition
Concrete Example: In reasoning about an earthquake, a linear chain might say 'Earthquake -> shaking -> ground moves'. A graph approach captures that 'Earthquake' links to 'earth' and 'quake', which implies 'ground' and 'shake' respectively, and these converge to the final concept, modeling the deductive leap.
Key Novelty
Graph-of-Thought (GoT) Framework
  • Models thoughts as nodes in a graph (extracted via OpenIE) and connections as edges, rather than a linear sequence
  • Uses a specialized graph attention network to encode this 'thought graph' alongside standard text and vision encoders
  • Fuses the graph, text, and visual representations via a gated fusion mechanism to generate rationales and answers
Architecture
Architecture Figure Figure 2
The overview of the Graph-of-Thought framework, detailing the two-stage process (Rationale Generation and Answer Generation) and the specific encoding modules.
Evaluation Highlights
  • +2.40% accuracy improvement over the strong Multimodal-CoT baseline on the ScienceQA test set using T5-base
  • Achieves 87.59% accuracy on ScienceQA (T5-base), surpassing the prior state-of-the-art
  • Outperforms ChatGPT by 9.28% on the ScienceQA benchmark
Breakthrough Assessment
7/10
Significant architectural innovation by explicitly encoding reasoning structure as a graph and fusing it with other modalities. Strong empirical results on ScienceQA, though primarily evaluated on T5-based models rather than the largest modern LLMs.
×