Evaluation Setup
Fine-tuning pre-trained models on NLU, NLG, Instruction Tuning, and Image Classification tasks.
Benchmarks:
- GLUE (Natural Language Understanding)
- E2E (Natural Language Generation)
- Instruction Tuning (LLaMA-2) (Instruction Following)
- Image Classification (ViT) (Computer Vision)
Metrics:
- Accuracy
- BLEU
- NIST
- ROUGE-L
- METEOR
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| Instruction Tuning results on LLaMA-2-7B show FourierFT achieving superior performance with drastically fewer parameters compared to LoRA. |
| Instruction Tuning (LLaMA-2-7B) |
ROUGE-L |
42.0 |
42.4 |
+0.4
|
| Instruction Tuning (LLaMA-2-7B) |
ROUGE-L |
42.8 |
42.8 |
0.0
|
| GLUE benchmark results demonstrate FourierFT matches LoRA performance with significantly fewer parameters. |
| GLUE (Avg) |
Score |
87.52 |
87.77 |
+0.25
|
| Computer Vision tasks reinforce the efficiency, showing comparable accuracy with <10% of the parameters. |
| Image Classification (ViT-Base) |
Accuracy |
0.68 |
0.76 |
+0.08
|
Main Takeaways
- FourierFT consistently matches or exceeds LoRA performance across NLU, NLG, and CV tasks.
- Achieves extreme compression rates: ~0.2% of LoRA's parameters for instruction tuning and ~6% for GLUE tasks.
- Parameter efficiency advantage increases as model scale grows (e.g., from RoBERTa Base to Large).
- Frequency bias analysis suggests different tasks may benefit from learning coefficients in specific frequency bands (low vs. high).