FActScore: A metric that decomposes long-form generations into atomic facts and verifies each against a knowledge base (like Wikipedia) to measure factuality
DPO: Direct Preference Optimization—a method to align language models to preferences without training a separate reward model, using a specific loss function on preference pairs
RAG: Retrieval-Augmented Generation—augmenting LLM input with retrieved documents to improve factual accuracy
SFT: Supervised Fine-Tuning—training a model on labeled input-output pairs
atomic fact decomposition: Breaking down a complex sentence into individual, verifiable statements
self-rewarding: Using the LLM itself to evaluate the quality of its own or others' outputs during training
RLHF: Reinforcement Learning with Human Feedback—aligning models using rewards derived from human preferences
RLAIF: Reinforcement Learning with AI Feedback—similar to RLHF but using AI models to generate the feedback/preferences