← Back to Paper List

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Mingyang Song, Mao Zheng
Large Language Model Department, Tencent, China
arXiv (2026)
Pretraining Benchmark RL Reasoning

📝 Paper Summary

Model Merging Parameter Efficient Fine-Tuning Modular Deep Learning
This survey introduces the FUSE taxonomy to systematize model merging, linking theoretical foundations like mode connectivity to practical algorithms for combining LLMs into unified models without retraining.
Core Problem
Deploying separate fine-tuned LLMs for every task is computationally prohibitive, while traditional ensembles incur high inference latency; current literature lacks a unified framework connecting merging theory to practice.
Why it matters:
  • Ensembles require running N models at inference time, multiplying costs linearly
  • Full retraining to combine capabilities is resource-intensive and forgets previously learned behaviors
  • Existing surveys focus on specific sub-areas (like Mixture-of-Experts) or lack theoretical depth regarding why weight interpolation succeeds
Concrete Example: Directly averaging weights of two independently trained networks typically results in catastrophic performance degradation because the models reside in different areas of the loss landscape (disconnected basins) or have misaligned internal permutations.
Key Novelty
FUSE Taxonomy
  • **F**oundations: Explains merging via loss landscape geometry, mode connectivity, and weight symmetries
  • **U**nification Strategies: Categorizes algorithms from simple averaging to task vector arithmetic and geometric interpolation
  • **S**cenarios: Maps merging to applications like multi-task learning, safety alignment, and federated learning
  • **E**cosystem: Reviews tools (mergekit), benchmarks, and community resources supporting the field
Breakthrough Assessment
8/10
Provides the first comprehensive taxonomy (FUSE) linking complex theoretical properties (loss basins, symmetries) to practical merging algorithms, addressing a rapidly growing sub-field of LLM development.
×