Cited By
View all- Zhang ZXia YWang HYang DHu CZhou XCheng D(2024)MPMoE: Memory Efficient MoE for Pre-Trained Models With Adaptive Pipeline ParallelismIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338563935:6(998-1011)Online publication date: Jun-2024
- Pan XLin WShi SChu XSun WLi B(2024)Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated SchedulesIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621327(1880-1889)Online publication date: 20-May-2024