CoT: Chain-of-Thought—a prompting technique where models generate intermediate reasoning steps before the final answer
Activation Steering: Modifying the internal hidden states (activations) of a model during inference to influence its behavior without changing weights
Residual Stream: The primary pathway of information flow in a Transformer where outputs of attention and feed-forward layers are added
Steering Vector: A direction vector in activation space added to the residual stream to induce a specific behavior (here, conciseness)
KL divergence: Kullback-Leibler divergence—a statistical measure quantifying how one probability distribution differs from a reference distribution
Jacobian: A matrix of first-order partial derivatives representing the local sensitivity of the model's outputs to changes in activations
Hessian: A matrix of second-order partial derivatives representing the curvature of the model's output surface
t-SNE: t-Distributed Stochastic Neighbor Embedding—a technique for visualizing high-dimensional data (like activations) in 2D or 3D