Attention Map: A matrix representing how much focus each token puts on every other token in a Transformer model
Laplacian Matrix: A matrix representation of a graph (L = D - A) that captures structural properties like connectivity and flow; here defined specifically for directed attention graphs
Eigenvalues: Scalar values associated with a linear transformation (matrix) that characterize its fundamental properties; in graphs, they describe connectivity and partitioning
Probing: Training a simple classifier (probe) on internal representations of a pre-trained model to predict specific properties (here, truthfulness)
AUROC: Area Under the Receiver Operating Characteristic Curve—a metric for binary classification performance, where 0.5 is random guessing and 1.0 is perfect
PCA: Principal Component Analysis—a technique to reduce the dimensionality of data while preserving as much variance (information) as possible
Log-determinant: The natural logarithm of the determinant of a matrix, used in prior work (AttentionScore) as a summary statistic for attention maps
Out-degree matrix: A diagonal matrix where each entry represents the sum of outgoing edge weights for a node; used here to normalize attention flow