← Back to Paper List

Single-Pass Document Scanning for Question Answering

W Cao, J Wang, Y Zheng, L Bao, Q Zheng…
Not explicitly listed in the provided text
arXiv, 4/2025 (2025)
RAG QA Memory

📝 Paper Summary

Modularized RAG pipeline Long-context retrieval
The Single-Pass Scanner uses a state-space model to process entire documents in linear time, identifying relevant sentences by conditioning on the full preceding context rather than isolated chunks.
Core Problem
Processing extremely large documents for QA is difficult: chunk-based embeddings lose global context, while full-context transformers suffer from prohibitive quadratic costs.
Why it matters:
  • Standard RAG splits documents into short chunks, losing connections between distant parts of the text necessary for answering complex questions
  • Full-context LLMs like GPT-4o are too expensive and slow to process hundreds of thousands of tokens for every query
Concrete Example: Chunk-based methods might retrieve a passage mentioning a character's action but miss the motivation explained 200 pages earlier. The Single-Pass Scanner reads the whole book at once to link these dependencies.
Key Novelty
Linear-Time Full-Context Scanning via State-Space Models
  • Adapts the Mamba-2 architecture to scan a concatenated query and document in a single pass, maintaining a running hidden state of the entire context
  • Replaces the language modeling head with a binary classification head that scores every sentence's relevance based on all tokens that came before it
  • Introduces a 'link-based' synthetic data generation method that creates training questions requiring information from two distant, thematically linked document chunks
Architecture
Architecture Figure Figure 1
Illustration of the Single-Pass Scanner processing a long document. It scans the concatenated Query + Document in one pass and assigns a relevance score to each sentence.
Evaluation Highlights
  • Outperforms state-of-the-art embedding models (NV-Embed-v2-7B, Stella-1.5B) across 41 long-document QA benchmarks while using fewer FLOPs
  • Achieves performance comparable to GPT-4o on documents >256k tokens while retrieving only ~1,600 tokens (50 sentences)
  • Generalizes significantly beyond its 10k token training length, handling up to 256k tokens effectively
Breakthrough Assessment
8/10
Strong empirical results outperforming top MTEB leaders with a much faster, linear-complexity architecture. The ability to generalize from 10k training context to 256k test documents is particularly impressive.
×