RARR: Retrofit Attribution using Research and Revision—a system that edits text to attribute it to retrieved evidence
FacTool: A fact-checking tool that uses varied tools (Google Search, etc.) to verify LLM outputs
Factcheck-GPT: A framework that decomposes long text into atomic claims for fine-grained verification
FactQA: A dataset collected by the authors comprising 6,480 factual questions from 7 existing corpora
FactBench: A benchmark collection by the authors comprising 4 datasets with human annotations to test fact-checker accuracy
FreshEval: A metric for evaluating correctly answered questions where the answer changes over time (e.g., stock prices)
BM25: Best Matching 25—a probabilistic information retrieval function used to rank documents based on query terms
YAML: A human-readable data serialization language used here for configuration files
Claim Processor: A module that breaks down a document into individual atomic claims for verification
Retriever: A module that searches for external evidence relevant to a claim
Verifier: A module that determines the truthfulness of a claim based on retrieved evidence