TSV: Truthfulness Separator Vector—a learnable vector added to LLM hidden states to push truthful and hallucinated representations apart
Optimal Transport: A mathematical framework used here to assign pseudo-labels to unlabeled data by minimizing the 'cost' of moving data points to class prototypes
AUROC: Area Under the Receiver Operating Characteristic curve—a metric measuring how well a classifier distinguishes between classes (0.5 is random, 1.0 is perfect)
von Mises-Fisher distribution: A probability distribution on a sphere, used here to model normalized embeddings where direction matters more than magnitude
Steering Vector: A vector added to model activations to influence behavior or representation without changing weights
pseudo-labeling: Assigning approximate labels to unlabeled data based on the model's current confidence, allowing that data to be used for training
latent space: The internal vector representation of data within the model
sinkhorn algorithm: An efficient algorithm used to solve optimal transport problems with entropy regularization