← Back to Paper List

HOFT: Householder Orthogonal Fine-tuning

Alejandro Moreno Arcas, Albert Sanchis, Jorge Civera, Alfons Juan
Valencian Research Institute for Artificial Intelligence, MLLP group, Universitat Politècnica de València
arXiv (2025)
Reasoning Pretraining MM

📝 Paper Summary

Parameter-Efficient Fine-Tuning (PEFT) Orthogonal Fine-Tuning
HOFT parameterizes fine-tuning updates using two efficient orthogonal matrices constructed via Householder transformations to preserve hyperspherical energy while reducing computational complexity.
Core Problem
Existing orthogonal fine-tuning methods like OFT and BOFT are computationally expensive and memory-intensive due to matrix inversions and inefficient parameterizations, making them difficult to scale.
Why it matters:
  • Orthogonal fine-tuning preserves pre-trained knowledge better than low-rank methods by maintaining hyperspherical energy, but its cost hinders adoption
  • Current orthogonal methods (OFT, BOFT) struggle to balance expressivity (covering the full orthogonal group) with runtime efficiency
  • Single-matrix orthogonal adaptation solves the Procrustes problem imperfectly; full expressivity requires two orthogonal matrices
Concrete Example: When adapting a pre-trained matrix M, methods like OFT multiply by only one orthogonal matrix Q (M' = QM). This fails to capture all possible adapted matrices that preserve singular values. HOFT uses two matrices (M' = Q_U M Q_V) to fully cover the solution space.
Key Novelty
Double-sided Householder Orthogonal Fine-tuning (HOFT)
  • Constructs two orthogonal matrices (Q_U and Q_V) using accumulated Householder reflections to adapt weights from both sides, ensuring full expressivity
  • Uses the CWY transform with a fast Neumann series approximation for matrix inversion, reducing complexity from cubic to linear/quadratic in rank r
  • Introduces SHOFT (Scaled HOFT), which adds a learnable magnitude vector between the orthogonal transformations to decouple direction and magnitude updates
Evaluation Highlights
  • +1.4 accuracy on GSM8K (Llama-3-8B) compared to LoRA, and +0.4 compared to DoRA
  • Achieves 2x-3x speedup in training time compared to standard Orthogonal Fine-tuning (OFT) while maintaining or exceeding performance
  • SHOFT outperforms DoRA on commonsense reasoning tasks (avg +0.5%) with fewer trainable parameters
Breakthrough Assessment
7/10
Strong theoretical grounding for using two orthogonal matrices. The efficient CWY inversion approximation makes orthogonal fine-tuning practical, offering a competitive alternative to LoRA/DoRA.
×