← Back to Paper List

RAG for AI-generated Content: A Survey

(China) Penghao Zhao, Hailin Zhang, ..., Jie Jiang, Bin Cui
Peking University, Tencent Inc.
arXiv, 2/2024 (2024)
RAG MM Benchmark KG

📝 Paper Summary

Modularized RAG pipeline Survey of RAG methods
This survey provides a comprehensive review of Retrieval-Augmented Generation (RAG) across the entire AIGC landscape, distilling foundational augmentation paradigms and summarizing applications beyond just text generation.
Core Problem
Existing AIGC models struggle with outdated knowledge, long-tail data scarcity, data leakage risks, and high costs, while current RAG literature often focuses narrowly on text generation or specific components.
Why it matters:
  • Lack of a unified perspective on RAG foundations hinders the exploration of augmentation methods beyond simple query-based input augmentation
  • Researchers overlook the potential of RAG in non-text modalities (image, video, audio) due to the text-centric focus of existing surveys
  • Practitioners lack guidelines on how to adapt retrievers and generators for specific multimodal applications
Concrete Example: While text RAG is well-known, applying RAG to image generation (e.g., retrieving reference images to guide Stable Diffusion) requires different augmentation paradigms like latent representation blending, which are often not discussed alongside text methods.
Key Novelty
Unified Abstraction of RAG Foundations
  • Classifies RAG not just by application but by 'foundation'—how the retrieved information interacts with the generation process (Input, Latent, Logit, or Process)
  • Extends the RAG scope beyond Large Language Models (LLMs) to the broader AIGC landscape, including GANs, Diffusion models, and Transformers across diverse modalities
Architecture
Architecture Figure Figure 1
A unified framework of Retrieval-Augmented Generation (RAG) applicable across modalities
Evaluation Highlights
  • Not applicable — this is a survey paper
Breakthrough Assessment
8/10
A highly comprehensive survey that successfully broadens the definition of RAG beyond just LLMs to all AIGC modalities, offering a valuable unified taxonomy for future research.
×