Metamorphic Relations (MR): Properties specifying how the output of a system should change (or not change) when the input is modified in a specific way; used here to check logical consistency
SelfCheckGPT: A baseline method that detects hallucinations by sampling multiple responses from an LLM and checking for consistency; often fails if the model consistently hallucinates the same error
Hallucination Score: A quantified metric (0 to 1) indicating the probability that a response is factually incorrect based on the violation of metamorphic relations
Synonymous Mutation: A generated variation of a sentence that preserves its original meaning (e.g., lexical substitution)
Antonymous Mutation: A generated variation of a sentence that conveys the opposite meaning
Test Oracle: A mechanism for determining whether a system has behaved correctly for a given test execution
Zero-resource: Methods that do not require external databases, search engines, or training data
TruthfulQA-Enhanced: An improved version of the TruthfulQA benchmark with updated correct answers created by the authors