SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs

📝 Paper Summary

Neuro-symbolic AI Agentic RAG pipeline Knowledge Graph Reasoning

SymAgent is a neural-symbolic agent that treats Knowledge Graphs as dynamic environments, using symbolic rules for planning and a self-learning loop to autonomously improve reasoning and identify missing knowledge.

Core Problem

Existing KGQA methods either treat KGs as static repositories (ignoring inherent logic) or assume KGs are complete (failing when data is missing), while LLMs often hallucinate on complex reasoning.

Why it matters:

Real-world Knowledge Graphs are often incomplete, causing standard semantic parsing (SPARQL) methods to fail execution.
Retrieval-augmented generation often fetches irrelevant or noisy subgraph information, confusing the LLM.
Current methods struggle to bridge the semantic gap between natural language questions and the structured, symbolic nature of KGs.

Concrete Example: For the question 'Where was the person who recorded song X born?', if the link between song X and the artist is missing in the KG, standard parsers return no answer. SymAgent detects this gap, uses a `searchWikidata` tool to find the artist in text, extracts the triple, and completes the reasoning.

Key Novelty

Neural-Symbolic Self-Learning Agent (SymAgent)

Agent-Planner: Uses LLM inductive reasoning to extract symbolic logic rules from the KG to guide question decomposition (e.g., inferring that 'birthplace' questions follow a specific relation chain).
Agent-Executor: Treats the KG as a dynamic environment, autonomously selecting tools (graph traversal or external text search) based on execution feedback to handle missing data.
Self-Learning: An iterative training loop where the agent explores, self-reflects to refine trajectories, and updates its policy using outcome-based rewards without human annotation.

Evaluation Highlights

+30.17% improvement in F1 score on complex reasoning datasets compared to GPT-4 using Qwen2-7B as the backbone.
Outperforms state-of-the-art ToG and RoG baselines on WebQSP, CWQ, and MetaQA-3hop datasets, achieving 78.54% Hits@1 on WebQSP.
Zero-shot generalization: Achieves 6x higher F1 score on domain-specific MetaQA-3hop compared to base LLM by effectively leveraging KG structure.

Breakthrough Assessment

8/10

Significant performance gains using smaller models (7B) against GPT-4. Effectively addresses the critical 'incomplete KG' problem via a novel self-learning tool-use framework.

⚙️ Technical Details

Problem Definition

Setting: Partially Observable Markov Decision Process (POMDP) defined as (Q, S, A, O, T) where the KG acts as the environment.

Inputs: Natural language question q and a Knowledge Graph G

Outputs: Final answer entity set A_result

Pipeline Flow

Agent-Planner (Induces symbolic rules from KG based on seed questions)
Agent-Executor (Iterative Thought-Action-Observation Loop)
Tool Execution (Graph traversal or External Text Search)
Answer Generation

System Modules

Agent-Planner

Bridge natural language and KG structure by generating symbolic rules to guide reasoning.

Model or implementation: LLM (e.g., Qwen2-7B)

Agent-Executor

Navigate the KG and external sources to answer the question using the plan.

Model or implementation: LLM (e.g., Qwen2-7B)

Toolbox

Interface with the environment.

Model or implementation: Deterministic functions

Novel Architectural Elements

Planner that induces symbolic rules via LLM to use as a high-level navigational map
Hybrid action space combining structured graph queries (searchNeighbor) and unstructured text retrieval (searchWikidata) with automatic triple extraction

Modeling

Base Model: Evaluated with Mistral-7B-Instruct-v0.2, LLaMA2-7B-Chat, and Qwen2-7B-Instruct

Training Method: Self-learning framework (Iterative SFT on self-synthesized trajectories)

Objective Functions:

Purpose: Maximize likelihood of high-quality trajectories.

Formally: L_SFT = - E[sum(log pi(x_j | q, x_<j))]

Adaptation: LoRA (Low-Rank Adaptation)

Training Data:

Initial dataset: Question-Answer pairs (WebQSP, CWQ)
Self-synthesis: Agent explores environment to create trajectories
Self-reflection: Agent refines failed/suboptimal trajectories
Heuristic Merge: Combines exploration and reflection based on reward (Recall)

Key Hyperparameters:

reward_metric: Recall of final answer set

Compute: Not reported in the paper

Comparison to Prior Work

vs. ToG: SymAgent incorporates external text search for missing links and uses symbolic planning, whereas ToG relies solely on existing KG paths.
vs. RoG: SymAgent is an interactive agent that learns from feedback, whereas RoG is a retrieve-then-generate pipeline.
vs. ChatKBQA: SymAgent handles incomplete KGs via external tools, whereas ChatKBQA fails if the SPARQL query cannot execute on the incomplete graph.

Limitations

Outcome-based rewards in self-learning may lead to spurious correlations (right answer for wrong reasons).
Iterative training gains plateau after a few iterations.
Error analysis shows significant 'Exceeding Maximum Steps' errors on complex datasets like MetaQA-3hop.
Reliance on heuristic merging for trajectory selection.

Reproducibility

Code: https://anonymous.4open.science/r/SymAgent/

Code is available at https://anonymous.4open.science/r/SymAgent/. Paper uses standard datasets (WebQSP, CWQ, MetaQA). Trajectory synthesis relies on interaction with a KG environment (simulated via Freebase/Wikidata subset).

📊 Experiments & Results

Evaluation Setup

Complex Question Answering over Knowledge Graphs with simulated incompleteness (random triple removal).

Benchmarks:

WebQuestionSP (WebQSP) (Multi-hop QA (up to 2 hops))
Complex WebQuestions (CWQ) (Multi-hop QA (up to 4 hops))
MetaQA-3hop (Multi-hop QA (Movie domain))

Metrics:

Hits@1
Accuracy
F1 Score
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
SymAgent (Qwen2-7B) outperforms strong baselines and GPT-4 across all datasets.
WebQSP	Hits@1	54.25	78.54	+24.29
CWQ	Hits@1	41.46	58.86	+17.40
MetaQA-3hop	F1	12.61	25.76	+13.15
Self-learning effectiveness compared to distillation.
CWQ	Hits@1	54.43	58.86	+4.43

Experiment Figures

Impact of iteration numbers in the self-learning phase on WebQSP and CWQ performance.

Performance of RoG (baseline) when the KG is augmented with triples extracted by SymAgent.

Main Takeaways

SymAgent with weak backbones (7B) achieves better or comparable performance to GPT-4 on complex reasoning tasks.
The Planner module is crucial; removing it drops Hits@1 on WebQSP from 78.54% to 64.37%.
Self-learning with self-refinement is more effective than distilling from GPT-4, likely due to better distribution matching and reduced hallucination.
The model successfully identifies missing triples in KGs, validating its ability to perform automatic KG completion during reasoning.

📚 Prerequisite Knowledge

Prerequisites

Knowledge Graph Question Answering (KGQA)
Reinforcement Learning (POMDP formulation)
Logic Rules (First-order logic)
Large Language Model Agents (ReAct framework)

Key Terms

Symbolic Rules: Logical formulas (e.g., r_head(x,y) ← r1(x,z) ∧ r2(z,y)) describing patterns in the Knowledge Graph.

ReAct: Reasoning and Acting—a prompting paradigm where LLMs generate reasoning traces (thoughts) before taking actions.

Hits@1: A metric measuring the proportion of questions where the top-ranked answer is correct.

POMDP: Partially Observable Markov Decision Process—a mathematical framework for modeling decision-making where the agent cannot see the full state of the environment.

Inductive Reasoning: Deriving general rules from specific observations (used here to infer KG patterns from similar questions).

SFT: Supervised Fine-Tuning—training a model on a labeled dataset.

LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique.