keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM

📝 Paper Summary

Knowledge-based Question Answering (KBQA) Modularized RAG pipeline Chain-of-Thought (CoT)

Keqing improves Large Language Model reliability in question answering by decomposing complex questions into sub-questions that map directly to executable logical chains on a Knowledge Graph.

Core Problem

LLMs often hallucinate when answering knowledge-intensive questions, and existing retrieval-augmented methods using embedding-based retrieval often introduce redundant or irrelevant context that confuses the model.

Why it matters:

Standard embedding-based retrieval fetches noisy documents that occupy token space without guaranteeing precise answers
Direct text-to-SQL generation by LLMs is prone to syntax errors and often produces unexecutable queries
Current methods lack interpretability regarding how an answer was derived step-by-step

Concrete Example: For the question 'what other works the director of Written on Wind has done?', a standard LLM might hallucinate movies. Keqing decomposes this into 'who was the director of [Written on Wind]?' then '[Director] was the director of which movies?', retrieving exact triplets from the Knowledge Graph.

Key Novelty

Decomposition-based Retrieval on Knowledge Graphs (Keqing)

Uses predefined templates to decompose complex questions into simpler sub-questions, treating the decomposition process as a natural Chain-of-Thought
Maps decomposed sub-questions to pre-collected logical chains on a Knowledge Graph to retrieve precise candidate entities rather than dense text chunks
Iteratively solves sub-questions using retrieved triplets, passing answers as seeds to the next sub-question in the chain

Architecture

The complete workflow of Keqing answering a complex movie question.

Evaluation Highlights

Achieves 93.3% accuracy on MetaQA-3hop (complex multi-hop questions), comparable to or exceeding state-of-the-art methods like KB-BINDER
Surpasses standard embedding-based retrieval (DPR) methods on precision by retrieving structured triplets instead of noisy text passages
Demonstrates high interpretability by generating responses that explicitly trace the reasoning path (e.g., entity A -> relation R -> entity B)

Breakthrough Assessment

7/10

Solid framework for structured KBQA that avoids the pitfalls of text-to-SQL. It effectively bridges the gap between unstructured LLM reasoning and structured Knowledge Graph querying, though reliance on predefined templates may limit open-ended generalization.

⚙️ Technical Details

Problem Definition

Setting: Knowledge-based Question Answering (KBQA) where a natural language query q must be answered using entities from a symbolic Knowledge Graph K

Inputs: Natural language question q

Outputs: Answer list A containing entities from the Knowledge Graph

Pipeline Flow

Question Decomposition (LLM + LoRA)
Template Matching (RoBERTa)
Knowledge Retrieval (Graph Traversal)
Candidate Reasoning (LLM)
Response Generation (LLM)

System Modules

Question Decomposition

Decomposes complex user query into a sequence of sub-questions with placeholders

Model or implementation: LLaMA-7B fine-tuned with LoRA

Template Matcher (Retrieval & Selection)

Maps generated sub-questions to the most similar predefined question templates to identify executable logical chains

Model or implementation: RoBERTa

Knowledge Retriever (Retrieval & Selection)

Executes logical chains on the KG starting from seed entities to find candidate triplets

Model or implementation: Symbolic Graph Traversal (Non-neural)

Candidate Reasoner (Retrieval & Selection)

Selects the correct answer entity from the retrieved candidate triplets

Model or implementation: ChatGPT or LLaMA

Response Generator

Synthesizes final natural language response based on the execution log

Model or implementation: LLM (e.g., ChatGPT)

Novel Architectural Elements

Intermediary Question Templates: Uses a semantic matching layer (RoBERTa) to bridge generated natural language sub-questions to rigid Knowledge Graph logical chains
Iterative Graph-guided CoT: The reasoning chain is not just a text generation but a physical traversal on the KG where step N's output physically triggers step N+1's retrieval

Modeling

Base Model: LLaMA-7B (for Decomposition) and ChatGPT (for Reasoning/Generation)

Training Method: Supervised Fine-Tuning with LoRA (for Question Decomposition)

Objective Functions:

Purpose: Train decomposition model to generate correct sub-question sequences.

Formally: Standard causal language modeling loss minimizing negative log-likelihood of target sub-questions given input query.

Adaptation: LoRA (Low-Rank Adaptation)

Trainable Parameters: LoRA parameters only (for the decomposition LLaMA model)

Training Data:

KBQA datasets (WebQSP, CWQ, MetaQA)
Logical chains collected from training set QA pairs

Key Hyperparameters:

base_model: LLaMA-7B
similarity_metric: Cosine similarity (for RoBERTa matching)

Compute: Not reported in the paper

Comparison to Prior Work

vs. KB-BINDER: Keqing decomposes to text sub-questions first, then maps to logic, rather than generating logic directly (avoiding syntax errors)
vs. DPR: Retrieving structured triplets (s, r, o) instead of unstructured text passages reduces noise
vs. Text-to-SQL methods: Uses template matching as a 'soft' interface between natural language and strict schema [not cited in paper]

Limitations

Relies on predefined question templates, which may limit generalization to questions with unseen structures
Performance depends heavily on the coverage and quality of the underlying Knowledge Graph
Template matching using RoBERTa acts as a bottleneck; incorrect matching breaks the reasoning chain

Reproducibility

Code availability is not provided. The paper relies on public datasets (WebQSP, CWQ, MetaQA) and standard models (LLaMA, RoBERTa, ChatGPT). The method for reverse-engineering logical chains from training data is described but scripts are not provided.

📊 Experiments & Results

Evaluation Setup

Knowledge-based Question Answering on standard benchmarks

Benchmarks:

MetaQA (Multi-hop reasoning (1-hop, 2-hop, 3-hop))
WebQSP (Knowledge-based QA)
CWQ (ComplexWebQuestions) (Complex Knowledge-based QA)

Metrics:

Accuracy (Hits@1)
F1 score
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Keqing demonstrates strong performance on MetaQA, particularly for complex multi-hop questions, outperforming or matching strong baselines.
MetaQA 3-hop	Accuracy	93.8	93.3	-0.5
MetaQA 2-hop	Accuracy	83.5	97.9	+14.4
WebQSP	Accuracy	73.2	73.5	+0.3
CWQ	Accuracy	41.1	40.8	-0.3

Main Takeaways

Keqing matches state-of-the-art performance on complex KBQA tasks while providing better interpretability.
The decomposition strategy works particularly well for multi-hop questions (MetaQA), reducing the noise that typically confuses embedding-based retrievers.
Structured retrieval (triplets) is more token-efficient and precise for LLM reasoning than dense passage retrieval.

📚 Prerequisite Knowledge

Prerequisites

Knowledge Graphs (entities, relations, triplets)
Chain-of-Thought (CoT) prompting
Basic understanding of fine-tuning LLMs (LoRA)

Key Terms

KBQA: Knowledge-Based Question Answering—answering questions by querying a structured database (Knowledge Graph) rather than unstructured text

LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices

logical chain: A sequence of relations in a Knowledge Graph that connects a start entity to a target answer (e.g., Director -> Directed_Movie -> Actor)

triplet: The fundamental unit of a Knowledge Graph, consisting of (Subject, Relation, Object)

seed entity: The starting entity identified in a question (e.g., 'Written on Wind') used to begin a traversal on the Knowledge Graph

slot filling: The process of inserting specific values (like entity names) into predefined placeholders in a template

RoBERTa: A robustly optimized BERT pretraining approach; a transformer model used here for semantic similarity matching

KB-BINDER: A strong baseline method for KBQA that generates logical forms

DPR: Dense Passage Retrieval—a method using dense vector representations to retrieve relevant text passages