← Back to Paper List

Foundations of Large Language Models

T Xiao, J Zhu
Northeastern University (NLP Lab), NiuTrans Research
arXiv, 1/2025 (2025)
Pretraining RL Reasoning QA

📝 Paper Summary

Pre-training paradigms Sequence modeling Self-supervised learning
This comprehensive resource systematically categorizes the foundational techniques of Large Language Models (LLMs), focusing on the shift from specialized supervised learning to general-purpose pre-training followed by adaptation.
Core Problem
Traditional NLP required training specialized systems from scratch using large amounts of task-specific labeled data, which is inefficient and limits generalization.
Why it matters:
  • Training from scratch for every task requires prohibitive amounts of labeled data
  • Specialized models often fail to generalize to new tasks or domains without extensive retraining
  • Pre-training allows models to acquire universal linguistic knowledge once and adapt efficiently to many downstream problems
Concrete Example: In traditional sentiment analysis, a model is trained solely on labeled sentiment data. In the pre-training paradigm, a model (like BERT) is first trained on massive unlabeled text to understand language, then fine-tuned on a small sentiment dataset, achieving better performance with less labeled data.
Key Novelty
Systematic Unification of Pre-training Paradigms
  • Categorizes pre-training into three architectures: Decoder-only (Language Modeling), Encoder-only (Masked Language Modeling), and Encoder-Decoder (Sequence-to-Sequence Denoising)
  • Unified view of adaptation: Contrasts fine-tuning (parameter updates) with prompting (context-based instruction) as two sides of model adaptation
  • Formalizes diverse tasks (translation, classification, regression) into a single text-to-text generation framework
Breakthrough Assessment
6/10
This is a foundational textbook/survey rather than a research paper proposing a novel method. It excels at synthesizing existing knowledge (BERT, GPT, T5) but does not introduce new benchmarks or SOTA results itself.
×