← Back to Paper List

Enhancing llm intelligence with arm-rag: Auxiliary rationale memory for retrieval augmented generation

E Melz
SearchStax
arXiv, 11/2023 (2023)
RAG Memory Reasoning

📝 Paper Summary

Memory recall Modularized RAG pipeline
ARM-RAG improves LLM problem-solving by storing successful reasoning chains (rationales) from past attempts in a dense retrieval index and using them as few-shot examples for similar future problems.
Core Problem
LLMs often fail to solve complex reasoning problems (like math) and do not learn from their successes or failures without expensive retraining or fine-tuning.
Why it matters:
  • Frozen LLMs are static and cannot acquire new problem-solving strategies over time
  • Fine-tuning approaches (like STaR) require substantial data and compute resources
  • Standard RAG typically retrieves factual documents, not problem-solving strategies or reasoning patterns needed for logic tasks
Concrete Example: When asked a math problem about house flipping profits, GPT-3.5 might miscalculate the initial value by adding repair costs incorrectly. However, if prompted with a 'rationale' (a step-by-step solution) from a similar correctly solved problem, it avoids this structural error.
Key Novelty
Auxiliary Rationale Memory (ARM)
  • Store the 'thought process' (step-by-step reasoning) of successfully solved problems in a vector database, rather than just factual documents
  • At inference time, retrieve these successful reasoning chains based on problem similarity to use as in-context learning demonstrations
  • Use 'obfuscation' (replacing nouns/names with nonsense words) during retrieval to force the retriever to match on problem structure/logic rather than surface-level keywords
Evaluation Highlights
  • +4.2% accuracy improvement (77.4% vs 73.2%) on GSM8K using Obfuscated ARM-RAG compared to the base GPT-3.5 baseline
  • Multi-attempt questioning (voting/best-of-N) alone achieves 91.9% accuracy, providing a rich source of correct rationales for the memory
  • Strong prompting (providing the correct answer as a hint) yields 80% accuracy, validating that optimal context significantly aids performance
Breakthrough Assessment
4/10
Proposes a logical extension to RAG (retrieving rationales), but the empirical gains are modest (+4%) and the system relies on a basic pipeline without novel training or architecture.
×