DPO: Direct Preference Optimization—a method to align language models with human preferences by directly optimizing on preference pairs without a separate reward model
FEVER: Fact Extraction and VERification—a standard benchmark dataset for checking the truthfulness of claims against text evidence
ME-FEVER: Multiple-Evidence FEVER—a new dataset created by the authors extending FEVER with irrelevant and misleading evidence to simulate real-world retrieval noise
Hallucination: Generative AI output that is nonsensical or unfaithful to the provided source content or real-world facts
Critique: A natural language explanation generated by the model justifying why a claim is judged as true, false, or neutral
SFT: Supervised Fine-Tuning—training a pre-trained model on a specific labeled dataset to adapt it for a particular task
Misleading Evidence: Evidence generated in the dataset that is highly related to the claim's topic but does not actually support or refute the specific claim, designed to trick the model
NLI: Natural Language Inference—the task of determining whether a 'hypothesis' (claim) logically follows from a 'premise' (evidence)