Contextual Hallucination: Information in a model's response that is either unsubstantiated by or contradictory to the source text (faithfulness error)
NLI: Natural Language Inference—determining if a hypothesis (response) is logically entailed by a premise (context)
Perplexity: A measurement of how well a probability model predicts a sample; lower scores indicate the text is more fluent/predictable to the model
MCC: Matthews Correlation Coefficient—a quality metric for binary classifications that is robust to class imbalance, ranging from -1 to +1
ROC AUC: Area Under the Receiver Operating Characteristic Curve—a performance measurement for classification problems at various thresholds settings
CLS token: A special token in BERT-like models used to represent the aggregate meaning of the entire sequence for classification tasks
O(n^2): Quadratic time complexity—meaning as input size doubles, processing time quadruples (standard Transformer attention)
HAT: Hierarchical Attention Transformer—a model designed for long documents using segment-wise and cross-segment attention
Longformer: A Transformer variant with sparse attention that scales linearly with sequence length, allowing longer inputs
RoBERTa: A robustly optimized BERT pretraining approach