← Back to Paper List

Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models

B Bi, S Liu, Y Wang, Y Xu, J Fang, L Mei, X Cheng
Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences
arXiv, 3/2025 (2025)
RAG Factuality QA

📝 Paper Summary

Modularized RAG pipeline Hallucination suppression
CK-PLUG is an inference-time method that detects knowledge conflicts via entropy shifts and modulates token probabilities to control whether an LLM relies on its internal parameters or retrieved context.
Core Problem
RAG systems struggle to balance reliance between internal parametric knowledge and external retrieved context, especially when conflicts arise (e.g., outdated internal knowledge vs. noisy retrieval).
Why it matters:
  • Excessive reliance on noisy retrieval leads to hallucinations, while ignoring retrieval for outdated models causes factual errors
  • Current alignment methods for factuality or faithfulness are often unidirectional and lack flexibility to adapt to varying retrieval qualities at inference time
  • Users need customizable control to prioritize either internal reliability or external evidence depending on the deployment scenario (e.g., trusted professional retrieval vs. adversarial web data)
Concrete Example: When asked 'Where is London?', an LLM might internally know 'England' but retrieve a counterfactual context saying 'London is in France'. Without control, the model confusingly blends information. CK-PLUG allows users to set a parameter α to force the answer to 'England' (parametric) or 'France' (contextual) as needed.
Key Novelty
Confidence Gain (CG) driven decoding modulation
  • Introduces 'Confidence Gain', a metric measuring the entropy shift in token distributions before and after context injection to detect knowledge conflicts
  • Uses a plug-and-play decoding strategy that blends parameter-aware and context-aware probability distributions only when conflicts are detected
  • Provides a single scalar α to manually tune reliance, or an adaptive mode that self-regulates based on model confidence without retraining
Architecture
Architecture Figure Figure 4
The CK-PLUG inference pipeline showing the parallel computation of parametric and context-aware distributions and their fusion.
Evaluation Highlights
  • Adjusts Memory Recall (MR) on LLaMA3-8B from 9.9% to 71.9% in counterfactual scenarios, significantly widening the control range compared to the fixed baseline of 42.1%
  • Achieves consistent performance improvements across six diverse RAG tasks (including NQ, HotpotQA, FEVER) using the adaptive auto-configuration mode
  • Maintains generation fluency and accuracy while modulating knowledge preference, validated by hit rates comparable to baselines even under strong control settings
Breakthrough Assessment
7/10
Offers a lightweight, training-free solution to a critical RAG problem (knowledge conflicts). The ability to linearly control reliance is practical, though the core mechanism is a decoding heuristic rather than a fundamental architectural change.
×