← Back to Paper List

Extracting books from production language models

A Ahmed, AF Cooper, S Koyejo, P Liang
Cornell University, Carnegie Mellon University
arXiv, 1/2026 (2026)
Factuality Pretraining Benchmark

📝 Paper Summary

Memorization and extraction Copyright infringement in Generative AI Adversarial attacks / Jailbreaking
A two-phase attack procedure combining jailbreaking and iterative continuation can extract large portions of in-copyright books from production LLMs like Claude 3.7 Sonnet, despite safety guardrails.
Core Problem
It is unclear if long-form extraction of copyrighted training data (demonstrated on open-weight models) is feasible on production LLMs, which employ extensive model- and system-level safeguards to prevent data leakage.
Why it matters:
  • Legal debates on fair use hinge on whether models merely learn abstract patterns or memorize and reproduce creative expression verbatim
  • Courts in the U.S. and Germany have reached different preliminary conclusions about whether model outputs constitute infringing copies
  • Production models (e.g., GPT-4, Claude) have black-box APIs and safety filters that theoretically prevent the specific behavior (verbatim completion) used in prior extraction research
Concrete Example: When prompted to 'Continue the following text exactly...' with the first sentence of 'Harry Potter', a production LLM usually refuses. Using the proposed method, Claude 3.7 Sonnet outputs nearly the entire book (95.8%) verbatim.
Key Novelty
Two-Phase Extraction Attack for Production LLMs
  • Phase 1: Bypass initial refusal using Best-of-NN (BoN) jailbreaking to force the model to complete a short seed prefix of the target book
  • Phase 2: Use iterative continuation prompts to generate long sequences of text, circumventing length limits and filters by feeding back the model's own output
Architecture
Architecture Figure Figure 2 & 3
The two-phase extraction pipeline: (1) Initial probe with optional jailbreak, (2) Iterative continuation loop
Evaluation Highlights
  • Extracted 95.8% of 'Harry Potter and the Sorcerer's Stone' near-verbatim from Claude 3.7 Sonnet
  • Extracted 76.8% of 'Harry Potter' from Gemini 2.5 Pro and 70.3% from Grok 3 without requiring any jailbreak (direct compliance)
  • GPT-4.1 proved most resistant, yielding only 4.0% of 'Harry Potter' even after extensive jailbreaking attempts
  • Recovered >94% of the text for two in-copyright books and two public domain books from Claude 3.7 Sonnet
Breakthrough Assessment
9/10
Provides the first definitive proof that production LLMs with safety guardrails can still leak entire copyrighted books, directly impacting active legal litigation regarding AI copyright.
×