← Back to Paper List

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenhang Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, K. Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, et al.
Not reported in the paper
arXiv.org (2023)
Memory P13N Benchmark

📝 Paper Summary

Memory recall Sparse memory QA
A comprehensive survey categorizing how Large Language Models are augmented with external memory to solve knowledge-intensive tasks like question answering and fact verification.
Core Problem
LLMs hallucinate and lack up-to-date information when relying solely on internal parameters for knowledge-intensive tasks.
Why it matters:
  • Internal parameters require expensive retraining to update knowledge
  • High-stakes applications (medical, legal) cannot tolerate hallucinations common in pure parametric models
  • Long-tail knowledge is often poorly represented in pre-training data
Concrete Example: When asked about a very recent event like 'Who won the 2023 World Cup?', a model trained in 2022 will hallucinate or plead ignorance, whereas a memory-augmented model retrieves the specific news article to answer correctly.
Key Novelty
Taxonomy of Memory-Augmented LLMs
  • Categorizes methods into two main phases: Retrieval (finding relevant info) and Generation (using info to answer)
  • Classifies retrieval into Sparse (keyword matching) and Dense (semantic embedding matching) approaches
  • Distinguishes generation strategies: concatenating memory to input vs. fusing memory into model architecture
Evaluation Highlights
  • Surveys performance on KILT benchmark (Knowledge Intensive Language Tasks)
  • Highlights RAG (Retrieval-Augmented Generation) achieving 44.39 EM on Natural Questions
  • Notes FiD (Fusion-in-Decoder) achieving 51.4 EM on Natural Questions by processing documents in parallel
Breakthrough Assessment
4/10
This is a survey paper summarizing existing work rather than proposing a new method. It provides a useful taxonomy but no novel algorithm.
×