MS-RAG: Simple and Effective Multi-Semantic Retrieval-Augmented Generation

📝 Paper Summary

Graph-based RAG pipeline

MS-RAG improves RAG accuracy and speed by combining vector-based chunk retrieval with a multi-semantic knowledge graph index, using a novel mix recall algorithm instead of slow LLM-based entity extraction.

Core Problem

Existing graph-based RAG methods suffer from inefficient indexing (prone to LLM errors/hallucinations) and slow inference due to heavy reliance on LLMs for entity extraction during retrieval.

Why it matters:

GraphRAG's reliance on LLM-based extraction is slow and costly, hindering industrial application
Missing nodes or edges in LLM-constructed graphs cause retrieval failures worse than naive RAG (10-30% of samples in benchmarks have missing elements)
Current methods fail to bridge the gap between structured graph reasoning and the speed/robustness of dense vector retrieval

Concrete Example: In a question 'Who is the mutual friend of Mike and Tom?', if the LLM fails to extract the edge between 'Mike' and 'Bill' during indexing, standard graph retrieval fails completely. MS-RAG's inclusion of chunk-level vectors alongside graph nodes allows it to recover the answer even when specific graph edges are missing.

Key Novelty

Multi-Semantic Indexing with Mix Recall

Create a unified index containing three levels of information: raw text chunks, extracted entities, and relations, all encoded as dense vectors
Replace slow LLM-based entity extraction at inference time with fast vector search to identify relevant graph nodes
Perform 'Mix Recall': retrieve relevant vectors (chunks) and graph neighbors (entities/relations) simultaneously, then fuse results via a voting-based reranker

Architecture

The retrieval pipeline of MS-RAG, detailing the Mix Recall and Multi-Semantic Rerank stages.

Evaluation Highlights

Achieves state-of-the-art retrieval performance on HotpotQA, improving Recall@2 by +18.6% (77.6% vs 59.0%) over HippoRAG
Inference speed is ~5x faster than GraphRAG (0.76s vs 4.12s) while achieving significantly higher correctness (72.3% vs 27.7%)
Outperforms IRCoT+BM25 by +18.1% Recall@2 on average across three multi-hop datasets

Breakthrough Assessment

8/10

Significant improvements in both accuracy (SOTA retrieval) and efficiency (5x faster) by successfully hybridizing vector and graph approaches, addressing key industrial bottlenecks of GraphRAG.

⚙️ Technical Details

Problem Definition

Setting: Open-domain multi-hop question answering using external knowledge

Inputs: Natural language query q

Outputs: Ranked list of relevant passages and final answer

Pipeline Flow

Multi-Semantic Indexing (Chunking → KG Extraction → Vector Encoding)
Mix Recall (Vector Search → Graph Neighbor Expansion)
Multi-Semantic Rerank (Voting → LLM Reranking)

System Modules

Multi-Semantic Indexer

Build databases for chunks, entities, and relations, all encoded as vectors

Model or implementation: BGE-M3 or Contriever (Encoder), GPT-3.5-turbo (KG Extraction)

Mix Recall Retriever

Retrieve relevant items using vector search and graph traversal

Model or implementation: Vector Search (e.g., FAISS)

Multi-Semantic Reranker

Filter and re-order retrieved items

Model or implementation: Qwen-7B (Lightweight LLM)

Novel Architectural Elements

Mix Recall Algorithm: Replaces LLM-based entity extraction with vector search over an entity index, combined with simultaneous chunk retrieval
Multi-Semantic Index structure: Explicitly maintains and queries three parallel indices (chunk, entity, relation) to provide redundancy against KG construction errors

Modeling

Base Model: Qwen-32B (for QA evaluation), Qwen-7B (for reranking), GPT-3.5-turbo (for indexing)

Training Method: Zero-shot prompting for graph construction and reranking

Key Hyperparameters:

lambda1 (entities retrieved): 5
lambda2 (relations retrieved): 10
lambda3 (chunks retrieved): 10
+ 2 more
lambda4 (voting candidates): 5
lambda_h (graph hops): 4

Compute: Inference time: 0.76s per query (vs 4.12s for GraphRAG)

Comparison to Prior Work

vs. GraphRAG: MS-RAG uses vector search for entry points instead of LLM extraction, avoiding slow community building steps
vs. HippoRAG: MS-RAG incorporates raw chunk retrieval to handle missing graph nodes/edges, whereas HippoRAG relies solely on the graph structure
vs. HybridRAG [not cited in paper]: Similar in combining vector+graph, but MS-RAG integrates them at the index/retrieval level via 'Mix Recall' rather than just merging final results

Limitations

Still reliant on LLM for initial graph construction (though mitigated by chunk index)
Index construction cost (LLM calls) is higher than pure vector methods
Performance depends on the quality of the underlying embedding model

Reproducibility

Code availability is not provided. Hyperparameters and prompts are detailed in the paper and appendix. Dataset splits follow prior work (IRCoT, HippoRAG).

📊 Experiments & Results

Evaluation Setup

Single-hop and Multi-hop QA on standard benchmarks

Benchmarks:

HotpotQA (Multi-hop reasoning QA)
2WikiMultiHopQA (Multi-hop QA)
MuSiQue (Multi-hop QA)

Metrics:

Recall@2 (R@2)
Recall@5 (R@5)
Correctness/Diversity/Comprehension (LLM-judged QA quality)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Single-step retrieval results show MS-RAG significantly outperforming baselines, especially on HotpotQA.
HotpotQA	Recall@2	59.0	77.6	+18.6
HotpotQA	Recall@2	69.5	79.4	+9.9
Average (3 datasets)	Recall@2	57.2	66.2	+9.0
Multi-step retrieval results (using IRCoT) confirm MS-RAG provides better documents for reasoning chains.
Average (3 datasets)	Recall@2	66.2	68.2	+2.0
Average (3 datasets)	Recall@2	53.7	71.8	+18.1
QA quality evaluation shows MS-RAG generates more correct and comprehensive answers than GraphRAG.
Mixed datasets (100 samples)	Correctness	27.7	72.3	+44.6

Experiment Figures

Comparison of graph retrieval failure modes (missing nodes/edges) and performance impact.

Main Takeaways

MS-RAG consistently outperforms strong baselines (HippoRAG, GraphRAG) in both retrieval accuracy and generation correctness.
The method is robust to 'missing graph node' errors (common in LLM-generated KGs) by falling back on chunk-level vector retrieval.
Inference is significantly faster (5x) than GraphRAG because it replaces expensive LLM calls with vector search during the retrieval phase.

📚 Prerequisite Knowledge

Prerequisites

Retrieval-Augmented Generation (RAG)
Knowledge Graphs (KG)
Dense vector retrieval
Graph traversal algorithms

Key Terms

MS-RAG: Multi-Semantic Retrieval-Augmented Generation—the proposed system combining vector and graph indices

GraphRAG: A baseline method that uses LLMs to build hierarchical graph communities for retrieval

HippoRAG: A baseline method using Personalized PageRank on knowledge graphs for retrieval

Recall@k: The percentage of questions where the correct answer is present in the top-k retrieved documents

Mix Recall: The proposed retrieval algorithm that combines vector search for entities/chunks with graph traversal for neighbors

Entity Disambiguation: Merging entities with different names but identical semantics (e.g., 'Fred Gehrke' and 'Clarence Fred Gehrke')

Dense Retrieval: Retrieving documents based on semantic similarity of vector embeddings rather than keyword matching