Semantic Equivalence: The property where two different sequences of text (e.g., 'Paris' and 'France's capital is Paris') share the same underlying meaning
Predictive Entropy: A measure of uncertainty calculating the information contained in the predictive distribution; higher entropy means the model is less sure
NLI: Natural Language Inference—a classification task determining whether one sentence (hypothesis) logically follows from another (premise)
Bidirectional Entailment: Two sentences are considered semantically equivalent if sentence A entails sentence B AND sentence B entails sentence A
Rouge-L: A metric measuring the longest common subsequence between two texts, often used for evaluating text generation quality
AUROC: Area Under the Receiver Operating Characteristic curve—a metric for binary classification (here, predicting if an answer is correct) where 0.5 is random and 1.0 is perfect
Monte Carlo Integration: A technique to estimate the value of an integral (here, the entropy) by averaging the results of random samples
OPT: Open Pre-trained Transformer—a series of open-source large language models similar to GPT-3