← Back to Paper List

Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification

C Cornelio, F Petruzzellis, P Lio
Department of Mathematics, University of Padova, Padova, Italy, Samsung AI, Cambridge, UK, Computer Science Department, University of Cambridge, Cambridge, UK
arXiv, 4/2025 (2025)
RAG Agent KG Reasoning Benchmark

📝 Paper Summary

Robotic Task Planning Neuro-symbolic AI
HVR is a neuro-symbolic robotic planner that combines hierarchical decomposition, Knowledge Graph RAG for context, and formal symbolic verification to improve accuracy on long-horizon tasks.
Core Problem
LLM-based robotic planners struggle with long-horizon tasks due to poor hierarchical reasoning, lack of environment-specific knowledge, and the generation of hallucinated or logically inconsistent plans.
Why it matters:
  • Robots in specialized settings (e.g., healthcare, kitchens) require precision that statistical LLMs often lack.
  • Executing incorrect plans in physical environments can be dangerous or costly; formal correctness is essential before execution.
  • Existing RAG methods improve knowledge access but do not guarantee the logical validity of the generated action sequences.
Concrete Example: In a task like 'Serve wine', an LLM might generate 'pour wine' before 'pick up bottle'. Without verification, the robot fails. HVR decomposes this into macro-actions, retrieves relevant object states (e.g., bottle is corked), and uses a symbolic validator to catch the missing 'uncork' or 'pick up' steps.
Key Novelty
HVR (Hierarchical, Verification, RAG)
  • Integrates three distinct components: Hierarchical planning (decomposing tasks into Macro Actions then Atomic Actions), KG-RAG (retrieving dynamic object states from a Knowledge Graph), and Symbolic Verification (using PDDL to check and correct logic).
  • Uses the Symbolic Validator not just for pre-execution checks but also as a runtime failure detector by comparing the expected 'ideal' world state with the observed scene graph.
Architecture
Architecture Figure Figure 1
The complete HVR pipeline workflow, from task input to execution.
Evaluation Highlights
  • HVR with Gemini achieves 94.19% Plan Correctness across all tasks, significantly outperforming the standard LLM baseline (17.72%) and other ablated versions.
  • On high-complexity tasks (>20 steps), HVR maintains high performance (88.39% with Gemini), whereas the standard LLM baseline drops to 3.76%.
  • Symbolic verification significantly boosts plan quality: Expanded Plan Verification (EPV) scores improve from 47.03% to 47.39% for Phi3 and remain high at 88.11% for Gemini after corrections.
Breakthrough Assessment
8/10
Strong integration of symbolic methods with LLMs for robotics. The comprehensive evaluation on complex long-horizon tasks (up to 40+ steps) distinguishes it from simpler block-stacking benchmarks.
×