research-article

UniFL: Enabling Loss-tolerant Transmission in Federated Learning

Authors:

Yang XuAuthors Info & Claims

APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking

Pages 163 - 168

https://doi.org/10.1145/3663408.3663432

Published: 03 August 2024 Publication History

Abstract

As Distributed Deep Learning (DDL) gains prominence, network constraints have emerged as a critical bottleneck impacting DDL performance. While state-of-the-art loss-tolerant (LT) transmission protocols enhance DDL efficiency, their application in federated learning (FL) environments is hindered by several challenges: (1) LT protocols necessitate client-side modifications, impractical in FL settings; (2) maintaining LT protocol transparency to senders compromises congestion control integrity; (3) LT protocols disrupt stream cipher, which is widely utilized in FL. To address these hurdles, this paper introduces UniFL, an innovative LT protocol tailored for FL applications. UniFL seamlessly integrates with FL architectures by preserving congestion control via a specialized speed limiter and adopting an advanced encryption technique that withstands packet loss, ensuring data integrity. UniFL is implemented within the NS3 for simulation evaluation. UniFL’s efficacy is evaluated across diverse models and datasets, demonstrating substantial performance enhancements in FL operations. In detail, UniFL can bring up to 40x speedup than the original FL with widely used congestion control algorithms and achieves throughput close to the state-of-the-art LT while being transparent to the workers.

References

[1]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015–4026.

[2]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]

[3]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.

[4]

Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David G Andersen, and Alexander Smola. 2013. Parameter server for distributed machine learning. In Big learning NIPS workshop, Vol. 6. 2.

[5]

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, 2021. Advances and open problems in federated learning. Foundations and trends® in machine learning 14, 1–2 (2021), 1–210.

[6]

Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine 37, 3 (2020), 50–60.

[7]

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1175–1191.

Digital Library

[8]

Ziyao Liu, Jiale Guo, Wenzhuo Yang, Jiani Fan, Kwok-Yan Lam, and Jun Zhao. 2022. Privacy-preserving aggregation in federated learning: A survey. IEEE Transactions on Big Data (2022).

[9]

Felix Sattler, Simon Wiedemann, Klaus-Robert Müller, and Wojciech Samek. 2019. Sparse binary compression: Towards distributed deep learning with minimal communication. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.

[10]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan McMahan, 2019. Towards federated learning at scale: System design. Proceedings of machine learning and systems 1 (2019), 374–388.

[11]

Zixuan Chen, Lei Shi, Xuandong Liu, Xin Ai, Sen Liu, and Yang Xu. 2023. Boosting distributed machine learning training through loss-tolerant transmission protocol. In 2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS). IEEE, 1–10.

[12]

Hao Wang, Han Tian, Jingrong Chen, Xinchen Wan, Jiacheng Xia, Gaoxiong Zeng, Wei Bai, Junchen Jiang, Yong Wang, and Kai Chen. 2024. Towards { Domain-Specific} Network Transport for Distributed { DNN} Training. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 1421–1443.

[13]

Linux. [n. d.]. netem - Network Emulator. https://man7.org/linux/man-pages/man8/tc-netem.8.html

[14]

Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. 2016. Bbr: Congestion-based congestion control: Measuring bottleneck bandwidth and round-trip propagation time. Queue 14, 5 (2016), 20–53.

Digital Library

[15]

web:ns3 2011-2024. ns-3 Network Simulator. https://www.nsnam.org/.

[16]

Sally Floyd, Tom Henderson, and Andrei Gurtov. 2004. The NewReno modification to TCP’s fast recovery algorithm. Technical Report.

[17]

Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS operating systems review 42, 5 (2008), 64–74.

Digital Library

Index Terms

UniFL: Enabling Loss-tolerant Transmission in Federated Learning
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Networks
  1. Network protocols
    1. Transport protocols

Recommendations

Adapting NORM Unicast Transport for Loss Tolerant and ECN Environments
MILCOM '14: Proceedings of the 2014 IEEE Military Communications Conference

This work presents new experimental results examining the loss tolerant unicast congestion control behavior of the NACK-Oriented Reliable Multicast (NORM) transport protocol documented initially as Internet Proposed Standard, Request for Comments (RFC) ...
Multimodal federated learning: Concept, methods, applications and future directions
Abstract
Multimodal learning mines and analyzes multimodal data in reality to better understand and appreciate the world around people. However, how to exploit this rich multimodal data without violating user privacy is a key issue. Federated learning is ...
Highlights
- The three different modes in the multimodal federated learning model are summarized.
- Multimodal fusion based on the federated learning framework is also specified.
- The difficulties and some ideas of multimodal federated learning ...
Wireless loss-tolerant congestion control protocol based on dynamic AIMD theory

Recently, the use of Internet Protocol has rapidly expanded beyond the Internet, as evidenced, for example, by the construction of the next-generation network, empowering telecommunication networks by IP. A huge IP network is expected to emerge in the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking

August 2024

230 pages

ISBN:9798400717581

DOI:10.1145/3663408

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Key-Area Research and Development Program of Guangdong Province
National Natural Science Foundation of China
Natural Science Foundation of Shanghai
the Major Key Project of PCL

Conference

APNet 2024

APNet 2024: The 8th Asia-Pacific Workshop on Networking

August 3 - 4, 2024

Sydney, Australia

Acceptance Rates

APNet '24 Paper Acceptance Rate 50 of 118 submissions, 42%;

Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
45
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)18

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents