context-grounded hallucinations: Generated content that contains information not supported or verifiable by the provided source text
atomic facts: Decomposed short sentences containing a single piece of information, often used as the unit of analysis in factual consistency evaluation
Extrinsic Correct: A type of hallucination where the model adds information not in the source text, but the information happens to be factually true in the real world
LLM-as-a-judge: Using a strong LLM to evaluate the quality or correctness of outputs from another model
meta-evaluation: Evaluating the evaluation method itself; here, measuring how well LLMs perform as judges of factual consistency
P(True): A metric quantifying the likelihood an LLM assigns to a statement being correct, used here to check if the model 'knows' a fact internally
Chain-of-Thought (CoT): A prompting strategy where the model is encouraged to generate intermediate reasoning steps before producing the final answer
F1 score: A metric balancing precision (accuracy of detected errors) and recall (coverage of actual errors)