Hallucinations: Instances where a model generates plausible but fabricated information not supported by the source
Factual Inconsistency: Generated text that contradicts the source material or established facts
Ensemble Learning: Merging outputs of multiple models (in this case, prompts) to produce a more accurate prediction
Balanced Accuracy: The arithmetic mean of sensitivity and specificity, used to evaluate performance on imbalanced datasets
ECE: Expected Calibration Error—a metric measuring the difference between a model's predicted confidence and its actual accuracy
Platt Scaling: A parametric calibration method that applies a logistic regression to model outputs to produce calibrated probabilities
LabelModel: A method from the Snorkel framework that learns conditional probabilities of noisy labeling functions (prompts) to reweight their outputs without ground truth data
Chain of Thought (CoT): A prompting technique where the model produces intermediate reasoning steps before the final answer
Weak Supervision: Using noisy, limited, or imprecise sources (like heuristics or prompts) to label training data
RFE: Recursive Feature Elimination—a feature selection technique that recursively removes the least important features
mRMR: Minimum Redundancy Maximum Relevance—a feature selection method maximizing relevance to the target while minimizing redundancy among features