DPO: Direct Preference Optimization—a method to align language models by optimizing a classification loss on preference pairs, implicitly solving a reward maximization problem
BTL: Bradley-Terry-Luce model—a standard probabilistic model where the probability of choosing one item over another depends on the difference of their underlying 'rewards'
Savage's Theory: A foundational framework for eliciting subjective probabilities using proper scoring rules (losses that encourage honest reporting)
Proper Loss: A loss function L(p, q) that is minimized in expectation when the predicted distribution q matches the true distribution p
KLST*: The paper's proposed framework extending choice theory to include Expandability, Local Choice Structure, and Monotonicity, supporting abstention
Machina's Lotteries: A generalized choice theory that allows for preferences over probability distributions (lotteries) without requiring the strict independence axiom
Bregman Divergence: A distance measure defined by a strictly convex function, generalizing metrics like squared Euclidean distance and KL divergence
PPPO: Proper-Proper Preference Optimization—the general class of algorithms defined by this paper using proper losses for both the reward objective and the final classification loss
SimPO: Simple Preference Optimization—a DPO variant that adds a margin term to the preference loss
ODIN: A method that disentangles reward from length to mitigate length bias in RLHF
Abstention: The ability of a choice model to assign non-zero probability to not picking either option (e.g., 'I don't know' or 'They are equal')
gemma2_2b_it: A specific instruction-tuned language model from the Gemma family used for the experiments