Rubric-based Reward Modeling: Evaluating responses based on explicit, structured criteria (rubrics) rather than a single scalar score
Contrastive Profiling: Analyzing the differences between a chosen and rejected response to isolate the specific factors causing the preference
Evidence-Anchored Constraint: A requirement that evaluation criteria must be grounded in specific text spans (evidence) from the instruction and response
SFT: Supervised Fine-Tuning—training a model on labeled examples to adapt it to a specific task
GenRM: Generative Reward Model—a model that outputs reasoning traces or critiques alongside a score, rather than just a number
Verbosity Bias: The tendency of language models to prefer longer responses regardless of quality
Bradley-Terry Model: A statistical model used to predict the probability of preferring one item over another in a pair
Teacher-Student Distillation: Training a smaller 'student' model to replicate the behavior or outputs of a larger, more capable 'teacher' model