activation space: The high-dimensional vector space formed by the intermediate outputs (activations) of neurons within a neural network
factuality hallucination: Generating content that contradicts verifiable real-world facts (e.g., 'Sydney is the capital of Australia')
faithfulness hallucination: Generating content that deviates from user intent, context, or internal consistency, even if factually correct in isolation
HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noiseโa clustering algorithm used here to find dense regions of shared activations
contrastive loss: A loss function that pulls positive pairs (correct/faithful examples) closer and pushes negative pairs (hallucinations) apart in the embedding space
orthogonality constraint: A mathematical condition enforcing that different probe vectors remain perpendicular (uncorrelated) to capture diverse features
spectral clustering: A technique using the eigenvalues of a similarity matrix to partition data into clusters, used here to group semantically related activations