Generative Teaching: The setting of using powerful models to create synthetic data specifically designed to teach new skills or behaviors to another model
Agentic Flow: A sequence of operations performed by AI agents (often LLMs with tools) that includes loops, reflection, and iterative refinement
Suggester-Editor Agents: A dual-agent pattern where one agent proposes edits to increase complexity or quality, and the other applies them
Content Transformation: The process of converting raw seed text into intermediate formats (e.g., debates, meeting transcripts) to facilitate diverse instruction generation
Model Collapse: A degenerative process where models trained on synthetic data lose variance and quality over generations
SFT: Supervised Fine-Tuning—training a model on labeled examples
GSM8K: Grade School Math 8K—a benchmark of grade school math word problems
MMLU: Massive Multitask Language Understanding—a benchmark covering 57 subjects like math, history, and law
AGIEval: A benchmark designed to evaluate foundation models using standardized exams (e.g., GRE, LSAT)
BBH: Big-Bench Hard—a subset of the Big-Bench benchmark focused on tasks where LLMs struggle
RAG: Retrieval-Augmented Generation—systems that fetch external data to answer questions