Keypoint Coverage: A metric measuring the proportion of atomic facts (keypoints) from a gold reference answer that are semantically present in the model's generated response
Mix Setting: An evaluation context where the model receives the correct passage mixed with plausible but irrelevant distractor passages, simulating noisy retrieval
Golden Setting: An oracle evaluation context where the model receives only the correct, ground-truth passage
Base Setting: A closed-book evaluation context where the model answers using only its internal parametric knowledge without external evidence
FActScore: A fine-grained atomic evaluation metric that decomposes text into atomic facts for verification; DisastQA's keypoint approach aligns with this concept
Parametric Knowledge: Information stored within the model's weights during pre-training, as opposed to information provided in the input context
Distractor: In MCQ, an incorrect option; in retrieval contexts, an irrelevant passage included to test the model's ability to filter noise