AFG: Atomic Fact Generation—breaking down long sentences into short, indivisible factual claims (atomic facts)
AFV: Atomic Fact Validation—verifying whether an atomic fact is supported by a trusted knowledge source (e.g., Wikipedia)
FActScore: A metric measuring the percentage of atomic facts in a generated text that are supported by a knowledge source
BERTScore-F1: A semantic similarity metric using contextual embeddings to compare generated text against reference text
GTR: Generalizable T5-based Retriever—a dense retrieval model used to find relevant Wikipedia passages
System Prompt: Initial instructions given to a chat model to define its behavior (used here to guide open models to act like InstructGPT)
Chat Template: A formatting standard for converting conversation history into a single string for LLMs (e.g., wrapping user inputs in specific tokens)
Pearson correlation: A statistic measuring linear correlation between two sets of data (here, ranking agreement between original and open metrics)