Internal States: The vector representations (activations) at specific layers of a neural network as it processes input.
Probing Classifier: A small, simple model (like a linear classifier or MLP) trained on the frozen representations of a large pre-trained model to test if those representations encode specific properties.
NLI: Natural Language Inference—determining if a hypothesis is entailed by, contradicts, or is neutral to a premise; used here to check factual consistency.
QuestEval: A reference-dependent metric for evaluating faithfulness in generation tasks using question answering.
PPL: Perplexity—a measurement of how well a probability model predicts a sample; often used as a proxy for model uncertainty.
SiLU: Sigmoid Linear Unit—an activation function used in neural networks, specifically in the Llama architecture.
Llama-2-7B: A specific open-source Large Language Model released by Meta with 7 billion parameters.
MLP: Multilayer Perceptron—a class of feedforward artificial neural network.
Super-Natural Instructions: A large benchmark dataset containing a diverse set of NLP tasks and instructions.