← Back to Paper List

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Meta AI
arXiv, 5/2022 (2022)
Pretraining Benchmark Factuality QA

📝 Paper Summary

Large Language Models (LLMs) Open Source Foundation Models
OPT is a suite of open-source decoder-only transformers (125M to 175B parameters) that replicates GPT-3 performance and capabilities while releasing full model weights, code, and training logbooks to democratize LLM research.
Core Problem
Access to full model weights of massive LLMs (like GPT-3) is restricted to a few well-resourced labs, hindering the broader research community's ability to study their limitations, robustness, and bias.
Why it matters:
  • Restricted access prevents independent verification of model capabilities and safety claims
  • Academic and civil society researchers cannot study the 'how and why' of LLM behaviors without weight access
  • Replicating these models is prohibitively expensive (compute/carbon) for most organizations, centralizing power
Concrete Example: While GPT-3 is available via paid APIs, researchers cannot inspect its weights to understand why it generates toxic content or how specific training data affects output. OPT releases the weights and training logs so researchers can audit the 175B model directly.
Key Novelty
Full Transparency Replication of GPT-3
  • Releases a 175B parameter model (OPT-175B) and smaller baselines trained to match GPT-3 architecture and performance
  • Provides a 'logbook' detailing the messy reality of training at scale, including hardware failures, loss divergences, and mid-flight hyperparameter adjustments
  • Achieves training with 1/7th the carbon footprint of GPT-3 by using newer hardware and efficient codebases
Evaluation Highlights
  • OPT-175B achieves zero-shot accuracy comparable to GPT-3 across 14 standard NLP tasks (avg ~55-60% acc)
  • In hate speech detection (ETHOS), OPT-175B outperforms Davinci (GPT-3 API) significantly (e.g., F1 0.812 vs 0.672 in few-shot multiclass)
  • Trained on 992 80GB A100 GPUs reaching 147 TFLOP/s utilization per GPU
Breakthrough Assessment
9/10
While architecturally standard (GPT-3 replica), the release of a 175B model's weights and the detailed engineering logbook was a massive milestone for open science and democratizing LLM research.
×