← Back to Paper List

Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness

L Xu, C Feng, K Zhang, L Zhengyong, W Xu, F Meng
School of Computer Science, Beijing Institute of Technology, Ant Group
arXiv, 10/2025 (2025)
RAG QA

📝 Paper Summary

Modularized RAG pipeline
RDR2 improves RAG by using an LLM router to actively navigate document headings and sections like a human reader, rather than treating documents as flat lists of isolated chunks.
Core Problem
Standard RAG systems treat retrieved passages as isolated chunks, discarding the original document structure (headings, hierarchy) that helps humans navigate and synthesize complex information.
Why it matters:
  • Losing structural context forces models to implicitly reconstruct relationships that were explicitly present in the source, harming multi-hop reasoning
  • Fixed chunking strategies restrict query-adaptive content selection, often missing relevant details buried in related sections
  • Flat retrieval paradigms struggle with 'factual-inductive' queries that require synthesizing multiple fragments scattered across a document
Concrete Example: When answering a complex question about a specific entity, standard RAG might retrieve three disjoint paragraphs. RDR2 instead locates the relevant heading in the document tree, then decides to 'expand' that section to read adjacent context, effectively re-assembling the complete evidence.
Key Novelty
Retrieve-DocumentRoute-Read (RDR2)
  • Formulates document reading as a dynamic routing task over a Document Structure Tree (DST), where an agent iteratively decides to Answer, Expand (unfold headings), or Refuse content
  • Introduces a method to automatically curate training data for this routing policy using only questions and documents (no answer supervision required), enabling the router to learn human-like browsing strategies
Architecture
Architecture Figure Figure 2
The 3-stage RDR2 pipeline: Retrieve, Document Route, and Read. It details the iterative routing process where an LLM selects actions ([ANS], [EXP], [REF]) on a tree structure.
Evaluation Highlights
  • Achieves state-of-the-art results on ASQA (+1.5 EM) and QAMPARI (+3.0 F1-5) using only off-the-shelf retrievers and readers
  • Outperforms proprietary-based methods (like ASC using ChatGPT) while generating answers that are ~50% shorter
  • Demonstrates effective test-time scaling: increasing expansion iterations consistently improves passage utility and answer quality without retraining
Breakthrough Assessment
8/10
Strong conceptual novelty in treating documents as trees rather than flat chunks. Achieves SOTA on difficult benchmarks with a lightweight, efficiently trained router, showing excellent generalization.
×