← Back to Paper List

Parameter-efficient fine-tuning of large-scale pre-trained language models

Ning Ding, Yujia Qin, Guang Yang, Fu Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Haitao Zheng, Jianfei Chen, Y. Liu, Jie Tang, Juanzi Li, Maosong Sun
Department of Computer Science and Technology, Tsinghua University, Beijing, China, Beijing Academy of Artificial Intelligence, Beijing, China, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Nature Machine Intelligence (2023)
Pretraining QA Benchmark

📝 Paper Summary

Parameter-Efficient Fine-Tuning (PEFT) Large Language Model Adaptation
The paper unifies parameter-efficient adaptation methods under the framework of 'delta-tuning' and empirically demonstrates that optimizing a tiny fraction of parameters yields performance comparable to full fine-tuning while significantly reducing computational costs.
Core Problem
As pre-trained language models (PLMs) scale to billions of parameters, standard full-parameter fine-tuning becomes computationally prohibitive and storage-intensive, making deployment impractical for many applications.
Why it matters:
  • Fine-tuning GPT-3 (175B parameters) requires updating 175,255 million parameters, which is infeasible for most researchers and industries
  • Storing separate full-model instances for every downstream task consumes massive storage
  • Existing research on efficient tuning was fragmented across different methods without a unified theoretical or empirical comparison framework
Concrete Example: Adapting GPT-3 to a specific task via vanilla fine-tuning requires updating ~175 billion parameters. In contrast, using Low Rank Adaptation (LoRA) updates only ~37.7 million parameters (matrices injected into attention layers), yet achieve similar results.
Key Novelty
Delta-Tuning Framework
  • Unifies diverse methods (LoRA, Adapter, Prefix-tuning, BitFit) under the concept of 'delta-tuning': optimizing a small 'delta' (change) in parameters while freezing the vast majority of the pre-trained model
  • Categorizes methods into three types: Addition-based (adding new modules), Specification-based (tuning specific existing params like biases), and Reparameterization-based (transforming optimization into low-rank subspaces)
  • Provides theoretical grounding using Optimal Control (viewing adaptation as steering a system) and Optimization theory (leveraging low intrinsic dimensionality of PLMs)
Evaluation Highlights
  • Delta-tuning achieves comparable performance to full fine-tuning (avg 69.27 vs 67.31) on over 100 NLP tasks while tuning <1% of parameters
  • Adapters achieve 66.80 average score vs 69.27 for full fine-tuning, despite tuning only ~2.38% of parameters
  • Manual templates boost zero-shot performance on RoBERTa-Large from 23.7 to 43.4, showing the importance of prompt design in low-resource settings
Breakthrough Assessment
9/10
This is a foundational analysis paper that defined the term 'delta-tuning' (now standard) and provided the first comprehensive, large-scale empirical and theoretical unification of PEFT methods.
×