SF: Source Faithfulness—measures whether the generated output is consistent with the provided source input
WF: World Factuality—measures whether the generated output aligns with established real-world knowledge and facts
AHE: Automatic Hallucination Evaluation—automated methods to detect and measure hallucinations without human intervention
NLG: Natural Language Generation—the subfield of AI focused on producing human-like text
RAG: Retrieval-Augmented Generation—systems that retrieve external documents to ground their generation
NLI: Natural Language Inference—the task of determining whether a hypothesis is entailed by, contradicts, or is neutral to a premise
QG-QA: Question Generation and Question Answering—an evaluation pipeline where questions are generated from the summary and answered using the source to check consistency
LLM-as-a-Judge: Using a powerful LLM to evaluate the quality or factuality of text generated by another model
atomic facts: Decomposing a complex sentence into the smallest indivisible units of information for precise verification
SF Evidence: Information extracted from the input source text to verify faithfulness
WF Evidence: Information retrieved from external knowledge bases or the web to verify factuality