Grounding documents: External text (like retrieved passages) used as evidence to verify a model's generated response
Atomic facts: The smallest indivisible units of information within a sentence (e.g., 'Obama was born in Hawaii' is one fact; 'Obama, born in Hawaii, was President' has two)
C2D (Claim-to-Doc): A synthetic data generation method that starts with a claim, decomposes it, and generates documents that support or refute specific parts of it
D2C (Doc-to-Claim): A synthetic data generation method that starts with a document chunk, summarizes it, and creates variations to test entailment
LLM-AggreFact: A new benchmark aggregation introduced in this paper, combining 10 existing datasets for evaluating factual consistency
Decontextualization: Rewriting a sentence so it stands alone without surrounding context (e.g., resolving pronouns like 'he' to 'Obama')
SFT: Supervised Fine-Tuning—training a model on labeled examples
NLI: Natural Language Inference—determining if a hypothesis is true given a premise