← Back to Paper List

StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering

T Ni, X Yuan, S Li, K Wu, RP Liu, W Ni, W Zhang
Not reported in the paper
arXiv, 10/2025 (2025)
RAG KG QA Reasoning

📝 Paper Summary

Graph-based RAG pipeline
StepChain GraphRAG combines question decomposition with breadth-first search on a dynamically updated knowledge graph to solve complex multi-hop questions transparently.
Core Problem
Current GraphRAG methods rely on static graphs or one-shot retrieval, which fails to capture evolving dependencies in complex multi-hop queries and often overwhelms the model with irrelevant context.
Why it matters:
  • Static graphs become cluttered or disconnected if not updated during inference, obscuring the chain of reasoning
  • One-shot retrieval risks missing critical details or providing superfluous information, compromising interpretability and accuracy in multi-step tasks
  • Lack of systematic updates prevents revisiting and refining previous insights as new evidence is discovered during iterative reasoning
Concrete Example: Consider a query like 'What is the birth city of the director of the movie starring Actor X?' A standard system might retrieve everything about Actor X in one go, missing the director link. StepChain GraphRAG decomposes this into 'Who directed the movie starring Actor X?' then 'Where was [Director] born?', updating the graph at each step.
Key Novelty
StepChain GraphRAG
  • Interleaves question decomposition with Breadth-First Search (BFS) reasoning, where each sub-question triggers a targeted graph expansion rather than a full-corpus search
  • Maintains an incremental knowledge graph that updates dynamically with every retrieval step, ensuring new evidence is instantly available for subsequent reasoning hops
  • Generates explicit 'evidence chains' (paths of entities and relations) for every sub-question, providing a transparent audit trail for how the final answer was derived
Architecture
Architecture Figure Figure 2
The complete StepChain GraphRAG pipeline from document chunking to final answer synthesis.
Evaluation Highlights
  • +4.70% Exact Match (EM) and +3.44% F1 improvement on HotpotQA compared to the strongest baseline (HopRAG)
  • Achieves state-of-the-art results across MuSiQue, 2WikiMultiHopQA, and HotpotQA benchmarks, with an average EM gain of +2.57%
  • Outperforms GPT-4o (no retrieval) by over +30% EM on average, confirming the critical role of the graph-based reasoning pipeline
Breakthrough Assessment
8/10
Significant improvement over SOTA on difficult multi-hop datasets by successfully integrating dynamic graph construction with iterative reasoning. High explainability adds practical value.
×