← Back to Paper List

A compoehensive survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT

Yihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, Lichao Sun
Lehigh University, Carnegie Mellon University, University of Illinois at Chicago
arXiv (2023)
Pretraining MM RL Speech QA

📝 Paper Summary

Generative AI AI-Generated Content (AIGC) Multimodal Generation
This survey provides a comprehensive roadmap of AI-Generated Content, tracing the evolution from early GANs to modern Large Language Models and multimodal systems, classifying key techniques and applications.
Core Problem
The rapid emergence of diverse generative models (ChatGPT, DALL-E 2) has created a fragmented landscape, making it difficult to understand the underlying connections, historical evolution, and common foundations of AIGC.
Why it matters:
  • AIGC is reshaping industries like art, advertising, and education by automating high-quality content creation.
  • Understanding the shift from unimodal to multimodal generation is critical for future research directions.
  • Identifying open problems (like safety and reasoning) is necessary to guide the next phase of generative AI development.
Concrete Example: Prior to comprehensive surveys, the connection between unrelated fields like GANs in computer vision and Transformers in NLP was unclear, obscuring how they converged into modern multimodal models like CLIP or DALL-E 2.
Key Novelty
Unified AIGC Taxonomy
  • Classifies generative models into unimodal (text-to-text, image-to-image) and multimodal (cross-modal generation) categories.
  • Identifies the Transformer architecture as the convergence point where computer vision and natural language processing distinct paths merged.
  • Highlights the role of Reinforcement Learning from Human Feedback (RLHF) in aligning generative outputs with human intent.
Architecture
Architecture Figure Figure 2
Overview of AIGC workflow distinguishing Unimodal vs Multimodal models
Evaluation Highlights
  • Reviews the transition from small-scale models (GPT-2, 1.5B parameters) to large foundation models (GPT-3, 175B parameters), enabling better generalization.
  • Contrasts training speeds across hardware, noting NVIDIA A100 GPUs achieve 7x faster BERT-large inference compared to V100s.
  • Summarizes the shift in computer vision from GAN dominance to Diffusion models (e.g., DALL-E 2) for higher stability and resolution.
Breakthrough Assessment
8/10
A timely and extensive literature review that organizes the chaotic explosion of generative AI into a structured history and taxonomy, though it is a survey rather than a new method.
×