← Back to Paper List

Reinforcement learning for optimizingragfor domain chatbots

M Kulkarni, P Tangarajan, K Kim, A Trivedi
Not explicitly reported in the paper
arXiv, 1/2024 (2024)
RAG RL QA

📝 Paper Summary

Modularized RAG pipeline Cost optimization
A reinforcement learning policy dynamically decides whether to retrieve external context or rely on LLM parametric knowledge, optimizing costs by reducing token usage without degrading answer quality.
Core Problem
Standard RAG pipelines retrieve context for every query, inflating costs and latency for queries where the LLM already knows the answer or context is redundant (e.g., follow-ups).
Why it matters:
  • For paid API-based LLMs, costs scale with input token count; retrieving unnecessary context significantly increases operational expenses
  • Large context windows can sometimes degrade LLM accuracy or cause hallucinations due to information overload
  • Retrieval latency can slow down user interactions for simple conversational turns like greetings or clarifications
Concrete Example: For a follow-up query like 'can you reduce it?' after 'is there an annual fee?', a standard RAG fetches new (likely irrelevant) context. The proposed method recognizes the history already contains the necessary info and skips retrieval, saving tokens.
Key Novelty
Policy-Based Retrieval Triggering
  • Trains a lightweight policy network (BERT-based) external to the RAG pipeline to act as a gatekeeper
  • The policy decides between [FETCH] and [NO_FETCH] actions based on conversation history
  • Uses GPT-4 as a reward model to train the policy, rewarding it for correctly skipping retrieval when the LLM can answer accurately without it
Architecture
Architecture Figure Figure 1
The architecture of the policy-based RAG optimization approach. It details the interaction between the Policy Model, the RAG pipeline, and the Reward Model (GPT-4).
Evaluation Highlights
  • Achieved ~31% cost savings (token reduction) on a test chat session while maintaining or slightly improving accuracy
  • In-house embedding model trained with infoNCE loss significantly outperformed public e5-base-v2 on Out-of-Domain query detection (0.55 vs 0.77 similarity gap)
  • GPT-4 evaluation of bot responses showed 100% agreement with manual verification on a sample session
Breakthrough Assessment
6/10
Practical application of RL for cost/latency optimization in industrial RAG systems. While the RL method is standard, the application to selective retrieval with GPT-4 rewards is effective.
×