SFT: Supervised Fine-Tuning—the process of training a pre-trained model on labeled (instruction, output) pairs
Distillation: Transferring capabilities from a strong 'teacher' model (e.g., GPT-4) to a weaker 'student' model by training the student on outputs generated by the teacher
Self-Improvement: A technique where a model generates its own training data (instructions or responses) to improve itself, often bootstrapping from a small seed set
Reasoning Trees: Structured data representations where a problem is broken down into a tree of possible reasoning steps, often used to train models in complex problem solving
CoT: Chain-of-Thought—a prompting technique where models are encouraged to generate intermediate reasoning steps before the final answer
Back-translation: In this context, generating synthetic instructions for existing text passages (treating the text as the output and predicting the instruction that caused it)
Process Supervision: Training or evaluating models based on the correctness of intermediate reasoning steps rather than just the final outcome