I recently came across an insightful article by Rhui Dih Lee, Laura Wynter, and Raghu Kiran Ganti from IBM Research, discussing a brilliant approach to enhancing large language models (LLMs). Their research introduces a toolkit that allows for the seamless creation of Mixture-of-Domain-Experts (MOE) models by mixing pre-trained and fine-tuned models, all without the need for extensive retraining or high costs.
This innovative toolkit lets you augment a model's performance across multiple domains, such as general knowledge, math, or finance, leading to more efficient and specialized outcomes. They’ve designed flexible methods like Gate-less MOE and Noisy MOE, which offer both high performance and reduced inference costs. And the best part? The toolkit is open-source, making it accessible for anyone eager to experiment and innovate.
If you're interested in diving deeper into this, I highly recommend checking out the article itself.
Arxiv.org - 2408.17280