LoRA: Low-Rank Adaptation—a technique to fine-tune large models by freezing weights and training small rank-decomposition matrices
Preconditioner: A transformation applied to gradients (usually multiplying by a matrix) to improve the convergence speed and stability of optimization
Riemannian optimization: Optimization techniques that respect the geometry of the underlying curved space (manifold) where parameters live, rather than assuming a flat Euclidean space
Quotient manifold: A type of manifold where points that represent the same object (e.g., due to rotation invariance) are treated as equivalent
Scaled GD: Gradient Descent where the gradient is scaled by a preconditioner matrix before the update step
Infinite-width NN: A theoretical framework for analyzing neural networks where the number of neurons in hidden layers approaches infinity, used to study convergence properties
Stable feature learning: A regime where neural network updates and outputs remain constant in magnitude as the network width increases, preventing exploding or vanishing signals