SAM: Sharpness-Aware Minimization—an optimization algorithm that seeks parameters minimizing loss within a local neighborhood to improve generalization
Linear Diagonal Network: A simplified neural network architecture where the predictor is the element-wise product of weight vectors from L layers (beta = w_1 ⊙ ... ⊙ w_L)
Implicit Bias: The tendency of an optimization algorithm (like GD or SAM) to converge to a specific solution (e.g., minimum norm) among many possible solutions that fit the data equally well
L2-SAM: A variant of SAM where the local neighborhood perturbation is constrained by the L2 norm
L-infinity SAM: A variant of SAM where the local neighborhood perturbation is constrained by the L-infinity norm
Max-margin classifier: A classifier that maximizes the distance (margin) to the nearest data point; L2 max-margin minimizes the L2 norm of weights, L1 max-margin minimizes the L1 norm
Sequential Feature Amplification: The phenomenon observed in this paper where L2-SAM amplifies minor input features early in training before eventually shifting focus to major features
Rescaled flow: A time-reparameterized continuous-time formulation of the optimization dynamics that simplifies analysis by removing the scalar loss derivative term
Directional Convergence: When the parameter vector aligns with a specific direction as its magnitude grows to infinity