← Back to Paper List

Archrag: Attributed community-based hierarchical retrieval-augmented generation

Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, Yuchi Ma
The Chinese University of Hong Kong, Shenzhen
arXiv preprint arXiv:2502.09891 (2025)
RAG KG QA

📝 Paper Summary

Graph-based RAG pipeline
ArchRAG enhances GraphRAG by detecting attributed communities using both structure and node semantics, organizing them into a hierarchical index (C-HNSW) for efficient multi-level retrieval.
Core Problem
Existing GraphRAG methods rely on structural community detection (ignoring node semantics), resulting in incoherent summaries, and use costly global search strategies that consume excessive tokens.
Why it matters:
  • Current community detection (e.g., Leiden) groups nodes purely by connections, often merging thematically distinct topics, which degrades the quality of LLM-generated summaries
  • Global search strategies in GraphRAG require traversing all community summaries, incurring high latency and token costs (e.g., $650 for 100 questions on Multihop-RAG)
  • Static retrieval granularities fail to simultaneously address abstract questions (requiring high-level themes) and specific questions (requiring entity-level details)
Concrete Example: GraphRAG detects communities using only structural links, potentially grouping a 'Physics' node with a 'Biology' node just because they are linked, leading to a vague summary. ArchRAG splits these based on semantic attributes to create coherent attributed communities.
Key Novelty
Attributed Community-based Hierarchical RAG (ArchRAG)
  • Augments the knowledge graph with semantic edges based on node attribute similarity, then detects 'Attributed Communities' (ACs) that are both densely connected and semantically coherent
  • Constructs a unified hierarchical index (C-HNSW) containing both entities and community summaries across all levels, enabling efficient top-k search without traversing every node
  • Uses an adaptive filtering mechanism during generation to select only the most relevant analysis reports from the retrieved hierarchical context
Architecture
Architecture Figure Figure 2
The overall workflow of ArchRAG, divided into Offline Indexing and Online Retrieval phases.
Evaluation Highlights
  • Achieves up to 10% higher accuracy than state-of-the-art graph-based RAG methods on specific questions while maintaining strong performance on abstract QA
  • Reduces token usage by up to 250 times compared to GraphRAG's Global Search by avoiding exhaustive community traversal
  • Consistently outperforms baselines like GraphRAG, HippoRAG, and naive RAG across Multi-hop RAG, RGB, and UltraDomain benchmarks
Breakthrough Assessment
8/10
Significantly addresses the cost/latency scalability issues of GraphRAG while improving accuracy through better community quality. The hierarchical index formulation is a strong engineering contribution.
×