RLHF: Reinforcement Learning from Human Feedback—a technique to fine-tune models using a reward model trained on human preferences
SFT: Supervised Fine-Tuning—training a model on a dataset of high-quality instruction-response pairs
Conversation Tree (CT): A data structure where a root prompt branches into multiple replies, allowing for diverse conversation paths
PPO: Proximal Policy Optimization—an RL algorithm used to optimize the model's policy against the reward model
Vicuna Elo Rank: A relative skill rating system for chatbots based on pairwise comparisons, often using GPT-4 as a judge
Tree State Machine: The system logic governing the data collection process, transitioning conversation trees through states like 'prompt review', 'growing', and 'finished'
Detoxify: A model-based tool for detecting toxic comments, used here to validate moderation efficacy