Zero-knowledge: In this context, referring to requiring no external knowledge sources (e.g., databases, search APIs), distinct from cryptographic zero-knowledge proofs
Hallucination: Generative model outputs that are plausible-sounding but factually incorrect or nonsensical
CoT: Chain-of-thought—a prompting technique where the model is asked to articulate its reasoning steps before giving a final answer
RAG: Retrieval-Augmented Generation—systems that fetch external documents to ground LLM answers
SelfCheckGPT: A baseline method that detects hallucinations by sampling multiple outputs from the same model and checking for consistency
Cross-consistency: Checking for factual agreement between outputs generated by different model architectures (e.g., Llama vs. Claude) rather than just one model
FELM: A dataset for evaluating factuality in Large Language Models
GPQA-diamond: A challenging dataset of graduate-level multiple-choice questions used to test reasoning and factuality