Knowledge Misalignment: Discrepancy between the factual knowledge embedded in a pre-trained model and the new knowledge introduced during fine-tuning
[REJ] token: A special token added to the vocabulary that the model learns to predict when it is uncertain or lacks knowledge about the ground truth
Abstention Tuning: A modified training objective where the model can minimize loss by assigning probability to a rejection token instead of the ground truth if the ground truth is hard to predict
Abstention-aware Decoding: A decoding strategy that subtracts a penalty term based on the [REJ] token's probability from the sequence score to avoid uncertain generation paths
FActScore: A metric for long-form generation that breaks text into atomic claims and verifies them against a knowledge base (Wikipedia)
Parametric Knowledge: Knowledge stored in the model's weights during pre-training, as opposed to knowledge provided in context or external retrieval
SFT: Supervised Fine-Tuning—training a pre-trained model on labeled instruction-response pairs
MLE: Maximum Likelihood Estimation—the standard training objective maximizing the probability of the ground truth tokens
DoLa: Decoding by Contrasting Layers—a method to improve factuality by contrasting logits from different layers
DPO: Direct Preference Optimization—a method to align language models to preferences without a reward model