Hallucination: Inaccurate or fabricated information generated by LLMs that contradicts input, context, or established facts.
ICD: Induce-then-Contrast Decoding—a method that trains a model to hallucinate and uses it to penalize similar outputs in a base model.
Positive Model: The standard, pre-trained LLM used as the reference for factual generation.
Evil Model: A version of the LLM fine-tuned to deliberately generate hallucinations or avoid correct answers.
Contrastive Decoding: A decoding strategy that subtracts the logits of a weaker or negative model from a strong model to highlight high-quality tokens.
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices.
MC1/MC2/MC3: Metrics for TruthfulQA. MC1: Best answer is true. MC2: Normalized probability of true answers. MC3: Average proportion of true answers rated higher than false ones.
FactScore: A benchmark measuring factual precision in long-form text generation by breaking responses into atomic facts and verifying them.