Evaluation Setup
Fine-tuning LLaMA2-7B on specific domain datasets and evaluating on corresponding test sets
Benchmarks:
- MMLU (General/Medical/Law Knowledge)
- GSM8K (Mathematical Reasoning)
- HumanEval (Code Generation)
- Big-Bench Hard (BBH) (Diverse NLU/NLG tasks)
Metrics:
- Training Speed (time)
- Energy Consumption
- Accuracy (implied from benchmark usage)
- Statistical methodology: Not explicitly reported in the paper
Main Takeaways
- Observation I: Multiple small LoRA heads outperform a single large LoRA head on diverse domains, suggesting task interference is a major bottleneck in standard PEFT.
- Observation II: When training multiple independent LoRA heads, the input matrices (A) tend to become similar (converge), while output matrices (B) remain distinct. This motivates the shared-A architecture.
- HydraLoRA achieves better parameter efficiency than independent LoRAs (LoRA-Split) by sharing the A matrix, while maintaining the performance benefits of specialized B matrices.
- The method provides significant training speedups (approx 2x) and energy savings (~50%) compared to a high-rank standard LoRA baseline.