← Back to Paper List

Opendecoder: Open large language model decoding to incorporate document quality in rag

F Mo, Z Su, Y Hui, J Zhang, JA Sun, Z Liu, C Zhang…
Université de Montréal, Clemson University, University of Notre Dame, Georgia Institute of Technology, Waseda University
arXiv, 1/2026 (2026)
RAG QA

📝 Paper Summary

Modularized RAG pipeline
OpenDecoder modifies the LLM's internal attention mechanism during decoding by explicitly injecting external relevance scores (retriever, ranker, QPP) to downweight noisy or irrelevant retrieved documents.
Core Problem
Standard RAG models assume retrieved documents are relevant and process them using standard self-attention, which fails to distinguish between useful evidence and noise when retrieval quality varies.
Why it matters:
  • Retrieval systems frequently return irrelevant or noisy documents, which degrades LLM generation quality and causes hallucinations.
  • Existing methods rely on prompting or black-box fine-tuning, but the internal attention mechanism still treats all input tokens as potentially relevant context without explicit quality guidance.
  • Prompt-based filtering strategies are sensitive to templates and increase latency, while standard fine-tuning doesn't structurally change how the model attends to noise.
Concrete Example: When an LLM is asked a question but the retriever returns completely irrelevant documents, a standard RAG model might hallucinate an answer based on the noise or its internal parametric knowledge without knowing which to trust. OpenDecoder uses explicit low relevance scores to force the attention mechanism to ignore the retrieved context and rely on internal knowledge.
Key Novelty
Explicit Indicator-Guided Decoding
  • Injects external quality signals (retriever scores, ranker scores, query performance prediction) directly into the attention mask during generation.
  • Modulates the attention scores so the model structurally attends less to tokens from documents marked as low-quality by external evaluators.
  • Trains the model to utilize these injected scores via a robustness training curriculum that mixes relevant, partially relevant, and irrelevant documents.
Architecture
Architecture Figure Figure 1 & 2
Comparison of Vanilla RAG decoding vs. OpenDecoder. Shows how OpenDecoder takes external scores (Retriever, Ranker, QPP), normalizes them, and injects them into the Attention mechanism.
Evaluation Highlights
  • Outperforms vanilla RAG and robust baselines (like RobustRAG and RbFT) across 5 QA benchmarks in noisy settings.
  • Achieves higher F1 scores in 'Extreme Noisy' settings (100% irrelevant documents) by effectively ignoring noise.
  • Demonstrates that combining multiple indicators (Retriever + Ranker + QPP) yields better performance than single indicators.
Breakthrough Assessment
7/10
Novel architectural modification to the attention mechanism for RAG robustness. Moves beyond simple prompting or filtering to structural integration of relevance signals. Strong empirical results on standard benchmarks.
×