skip to main content
10.1145/3663408.3663432acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

UniFL: Enabling Loss-tolerant Transmission in Federated Learning

Published: 03 August 2024 Publication History

Abstract

As Distributed Deep Learning (DDL) gains prominence, network constraints have emerged as a critical bottleneck impacting DDL performance. While state-of-the-art loss-tolerant (LT) transmission protocols enhance DDL efficiency, their application in federated learning (FL) environments is hindered by several challenges: (1) LT protocols necessitate client-side modifications, impractical in FL settings; (2) maintaining LT protocol transparency to senders compromises congestion control integrity; (3) LT protocols disrupt stream cipher, which is widely utilized in FL. To address these hurdles, this paper introduces UniFL, an innovative LT protocol tailored for FL applications. UniFL seamlessly integrates with FL architectures by preserving congestion control via a specialized speed limiter and adopting an advanced encryption technique that withstands packet loss, ensuring data integrity. UniFL is implemented within the NS3 for simulation evaluation. UniFL’s efficacy is evaluated across diverse models and datasets, demonstrating substantial performance enhancements in FL operations. In detail, UniFL can bring up to 40x speedup than the original FL with widely used congestion control algorithms and achieves throughput close to the state-of-the-art LT while being transparent to the workers.

References

[1]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015–4026.
[2]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
[3]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.
[4]
Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David G Andersen, and Alexander Smola. 2013. Parameter server for distributed machine learning. In Big learning NIPS workshop, Vol. 6. 2.
[5]
Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, 2021. Advances and open problems in federated learning. Foundations and trends® in machine learning 14, 1–2 (2021), 1–210.
[6]
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine 37, 3 (2020), 50–60.
[7]
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1175–1191.
[8]
Ziyao Liu, Jiale Guo, Wenzhuo Yang, Jiani Fan, Kwok-Yan Lam, and Jun Zhao. 2022. Privacy-preserving aggregation in federated learning: A survey. IEEE Transactions on Big Data (2022).
[9]
Felix Sattler, Simon Wiedemann, Klaus-Robert Müller, and Wojciech Samek. 2019. Sparse binary compression: Towards distributed deep learning with minimal communication. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[10]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan McMahan, 2019. Towards federated learning at scale: System design. Proceedings of machine learning and systems 1 (2019), 374–388.
[11]
Zixuan Chen, Lei Shi, Xuandong Liu, Xin Ai, Sen Liu, and Yang Xu. 2023. Boosting distributed machine learning training through loss-tolerant transmission protocol. In 2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS). IEEE, 1–10.
[12]
Hao Wang, Han Tian, Jingrong Chen, Xinchen Wan, Jiacheng Xia, Gaoxiong Zeng, Wei Bai, Junchen Jiang, Yong Wang, and Kai Chen. 2024. Towards { Domain-Specific} Network Transport for Distributed { DNN} Training. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 1421–1443.
[13]
Linux. [n. d.]. netem - Network Emulator. https://man7.org/linux/man-pages/man8/tc-netem.8.html
[14]
Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. 2016. Bbr: Congestion-based congestion control: Measuring bottleneck bandwidth and round-trip propagation time. Queue 14, 5 (2016), 20–53.
[15]
web:ns3 2011-2024. ns-3 Network Simulator. https://www.nsnam.org/.
[16]
Sally Floyd, Tom Henderson, and Andrei Gurtov. 2004. The NewReno modification to TCP’s fast recovery algorithm. Technical Report.
[17]
Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS operating systems review 42, 5 (2008), 64–74.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking
August 2024
230 pages
ISBN:9798400717581
DOI:10.1145/3663408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep Learning
  2. Distributed Training
  3. Federated Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

APNet 2024

Acceptance Rates

APNet '24 Paper Acceptance Rate 50 of 118 submissions, 42%;
Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 45
    Total Downloads
  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)18
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media