Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness

📝 Paper Summary

Graph-based RAG pipeline

This paper systematically evaluates three KG-RAG methods (RoG, ToG, G-Retriever) by simulating KG incompleteness via random deletion and reasoning path disruption, finding that while they outperform retrieval-free baselines, they are highly sensitive to missing direct evidence.

Core Problem

Existing KG-RAG evaluations typically assume complete Knowledge Graphs (KGs) where answers are directly inferable, failing to reflect real-world scenarios where KGs are often incomplete.

Why it matters:

KGs in practice are often sparse and missing critical links.
Current benchmarks do not test whether KG-RAG methods can reason over incomplete data or if they fail brittlely when direct evidence is missing.

Key Novelty

Systematic Robustness Evaluation of KG-RAG under Incompleteness

Introduces specific deletion strategies: 'Random Triple Deletion' (general sparsity) and 'Reasoning Path Disruption' (targeting critical reasoning chains).
Assesses whether LLMs can compensate for missing KG links using their internal parametric knowledge or alternative paths.

Architecture

An illustration of the 'Reasoning Path Disruption' concept using Justin Bieber's brother example, showing how removing a direct link forces (or fails to force) the model to use an alternative multi-hop path.

Evaluation Highlights

RoG on WebQSP: Accuracy drops from 76.75% (no deletion) to 72.15% (20% random deletion) and sharply to 65.43% (-14.7%) under reasoning path disruption.
ToG on WebQSP: Accuracy drops from 44.19% to 41.61% (20% random deletion) and to 38.25% under reasoning path disruption.
G-Retriever on WebQSP: Accuracy drops from 53.43% to 50.88% (20% random deletion) and to 49.36% under reasoning path disruption.
Despite drops, all methods still outperform 'No Retrieval' baselines (e.g., RoG no-retrieval is 50.46%).
Models often fail to switch to alternative valid reasoning paths when the primary/shortest path is broken.

Breakthrough Assessment

3/10

It is a solid evaluation paper that highlights a weakness in current methods, but it does not propose a new method to solve the problem, serving primarily as a diagnostic study.

⚙️ Technical Details

Pipeline Flow

Select KG-RAG method (RoG, ToG, or G-Retriever).
Apply deletion strategy to the Knowledge Graph (Random Deletion or Path Disruption).
Run the KG-RAG method on the modified KG.
Evaluate Accuracy and Hits@1 against ground truth answers.

System Modules

KG Deletion Module

Simulate incompleteness.

Model or implementation: N/A (Algorithmic)

KG-RAG Model

Retrieve and Generate Answer.

Model or implementation: Varied (RoG, ToG, G-Retriever)

📊 Experiments & Results

Evaluation Setup

Perturbation analysis on standard KGQA benchmarks.

Benchmarks:

WebQuestionsSP (WebQSP) (KGQA)
Complex WebQuestions (CWQ) (KGQA)

Metrics:

Accuracy
Hits

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
WebQSP	Accuracy	76.75	65.43	-11.32
WebQSP	Accuracy	76.75	72.15	-4.60
CWQ	Accuracy	57.49	52.95	-4.54
WebQSP	Accuracy	76.75	50.46	-26.29

Main Takeaways

KG-RAG methods are sensitive to incomplete knowledge; disrupting specific reasoning paths causes significant drops.
Even with 20% of the KG deleted, KG-RAG still outperforms retrieval-free LLMs, proving the residual value of incomplete KGs.
Current models struggle to adapt by finding alternative reasoning paths when the most obvious one is missing.