research-article

FaPES: Enabling Efficient Elastic Scaling for Serverless Machine Learning Platforms

Authors:

Chuan WuAuthors Info & Claims

SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing

Pages 443 - 459

https://doi.org/10.1145/3698038.3698548

Published: 20 November 2024 Publication History

Abstract

Serverless computing platforms have become increasingly popular for running machine learning (ML) tasks due to their user-friendliness and decoupling from underlying infrastructure. However, auto-scaling to efficiently serve incoming requests still remains a challenge, especially for distributed ML training or inference jobs in a serverless GPU cluster. Distributed training and inference jobs are highly sensitive to resource configurations, and demand high model efficiency throughout their lifecycle. We propose FaPES, a FaaS-oriented Performance-aware Elastic Scaling system to enable efficient resource allocation in serverless platforms for ML jobs. FaPES enables flexible resource loaning between virtual clusters for running training and inference jobs. For running inference jobs, servers are reclaimed on demand with minimal preemption overhead to guarantee service level objective (SLO); for training jobs, optimal GPU allocation and model hyperparameters are jointly adapted based on an ML-based performance model and a resource usage prediction board, alleviating users from model tuning and resource specification. Evaluation on a 128-GPU testbed demonstrates up to 24.8% job completion time reduction and ×1.8 Goodput improvement, as compared to representative elastic scaling schemes.

References

[1]

2024. Apache Openwhisk. https://github.com/apache/openwhisk.

[2]

2024. AWS Lambda. https://aws.amazon.com/cn/pm/lambda/.

[3]

2024. AWS-Placement Group. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html.

[4]

2024. Google Cloud Function. https://cloud.google.com/functions.

[5]

2024. google cluster data. https://github.com/google/cluster-data.

[6]

2024. metrics-server. https://github.com/kubernetes-sigs/metrics-server.

[7]

2024. Microsoft Azure. https://azure.microsoft.com/en-us.

[8]

2024. prometheus. https://github.com/prometheus/prometheus.

[9]

Yixin Bao, Yanghua Peng, and Chuan Wu. 2019. Deep learning-based job placement in distributed machine learning clusters. In IEEE INFOCOM 2019-IEEE conference on computer communications. IEEE, 505--513.

Digital Library

[10]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[11]

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. 2024. {ServerlessLLM}: {Low-Latency} Serverless Inference for Large Language Models. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 135--153.

[12]

Diandian Gu, Yihao Zhao, Yinmin Zhong, Yifan Xiong, Zhenhua Han, Peng Cheng, Fan Yang, Gang Huang, Xin Jin, and Xuanzhe Liu. 2023. ElasticFlow: An elastic serverless training platform for distributed deep learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 266--280.

Digital Library

[13]

Jianfeng Gu, Yichao Zhu, Puxuan Wang, Mohak Chadha, and Michael Gerndt. 2023. FaST-GShare: Enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference. In Proceedings of the 52nd International Conference on Parallel Processing. 635--644.

Digital Library

[14]

Changho Hwang, Taehyun Kim, Sunghyun Kim, Jinwoo Shin, and KyoungSoo Park. 2021. Elastic resource sharing for distributed deep learning. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 721--739.

[15]

Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R Ganger. 2023. Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling. In Proceedings of the 29th Symposium on Operating Systems Principles. 642--657.

Digital Library

[16]

Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, et al. 2024. {MegaScale}: Scaling Large Language Model Training to More Than 10,000 {GPUs}. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 745--760.

[17]

Brett Koonce and Brett Koonce. 2021. ResNet 50. Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization (2021), 63--72.

[18]

Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, and Cong Wang. 2023. Lyra: Elastic scheduling for deep learning clusters. In Proceedings of the Eighteenth European Conference on Computer Systems. 835--850.

Digital Library

[19]

Mingzhen Li, Wencong Xiao, Hailong Yang, Biao Sun, Hanyu Zhao, Shiru Ren, Zhongzhi Luan, Xianyan Jia, Yi Liu, Yong Li, et al. 2023. EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.

Digital Library

[20]

Qingyuan Liu, Yanning Yang, Dong Du, Yubin Xia, Ping Zhang, Jia Feng, James Larus, and Haibo Chen. 2024. Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability. arXiv preprint arXiv:2403.00433 (2024).

[21]

Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. 2020. Themis: Fair and efficient {GPU} cluster scheduling. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). 289--304.

[22]

Sam McCandlish, Jared Kaplan, Dario Amodei, and OpenAI Dota Team. 2018. An empirical model of large-batch training. arXiv preprint arXiv:1812.06162 (2018).

[23]

Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R Devanur, Gregory R Ganger, Phillip B Gibbons, and Matei Zaharia. 2019. PipeDream: Generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM symposium on operating systems principles. 1--15.

Digital Library

[24]

Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. 2020. {Heterogeneity-Aware } cluster scheduling policies for deep learning workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 481--498.

Digital Library

[25]

Joe H Novak, Sneha Kumar Kasera, and Ryan Stutsman. 2019. Cloud functions for fast and robust resource auto-scaling. In 2019 11th International Conference on Communication Systems & Networks (COMSNETS). IEEE, 133--140.

[26]

Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. 2018. Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference. 1--14.

Digital Library

[27]

Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R Ganger, and Eric P Xing. 2021. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21).

[28]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.

[29]

Pablo Gimeno Sarroca and Marc Sánchez-Artigas. 2024. Mlless: Achieving cost efficiency in serverless machine learning training. J. Parallel and Distrib. Comput. 183 (2024), 104764.

Digital Library

[30]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[31]

Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, and Ion Stoica. 2020. Blink: Fast and generic collectives for distributed ml. Proceedings of Machine Learning and Systems 2 (2020), 172--186.

[32]

Adam Weingram, Yuke Li, Hao Qi, Darren Ng, Liuyao Dai, and Xiaoyi Lu. 2023. xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning. Journal of Computer Science and Technology 38, 1 (2023), 166--195.

Digital Library

[33]

Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, et al. 2018. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 595--610.

[34]

Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. {AntMan}: Dynamic scaling on {GPU} clusters for deep learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 533--548.

[35]

Hanfei Yu, Athirai A Irissappane, Hao Wang, and Wes J Lloyd. 2021. Faasrank: Learning to schedule functions in serverless platforms. In 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 31--40.

[36]

Jinhui Yuan, Xinqi Li, Cheng Cheng, Juncheng Liu, Ran Guo, Shenghang Cai, Chi Yao, Fei Yang, Xiaodong Yi, Chuan Wu, et al. 2021. One-flow: Redesign the distributed deep learning framework from scratch. arXiv preprint arXiv:2110.15032 (2021).

[37]

Han Zhao, Weihao Cui, Quan Chen, Shulai Zhang, Zijun Li, Jingwen Leng, Chao Li, Deze Zeng, and Minyi Guo. 2024. Towards Fast Setup and High Throughput of GPU Serverless Computing. arXiv preprint arXiv:2404.14691 (2024).

[38]

Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, and Aditya Akella. 2023. Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 703--723.

[39]

Ruiting Zhou, Jinlong Pang, Qin Zhang, Chuan Wu, Lei Jiao, Yi Zhong, and Zongpeng Li. 2022. Online scheduling algorithm for heterogeneous distributed machine learning jobs. IEEE Transactions on Cloud Computing 11, 2 (2022), 1514--1529.

Index Terms

FaPES: Enabling Efficient Elastic Scaling for Serverless Machine Learning Platforms
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing

Recommendations

ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

This paper proposes ElasticFlow, an elastic serverless training platform for distributed deep learning. ElasticFlow provides a serverless interface with two distinct features: (i) users specify only the deep neural network (DNN) model and ...
Stratus: cost-aware container scheduling in the public cloud
SoCC '18: Proceedings of the ACM Symposium on Cloud Computing

Stratus is a new cluster scheduler specialized for orchestrating batch job execution on virtual clusters, dynamically allocated collections of virtual machine instances on public IaaS platforms. Unlike schedulers for conventional clusters, Stratus ...
Characterizing serverless platforms with serverlessbench
SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing

Serverless computing promises auto-scalability and cost-efficiency (in "pay-as-you-go" manner) for high-productive software development. Because of its virtue, serverless computing has motivated increasingly new applications and services in the cloud. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing

November 2024

1062 pages

ISBN:9798400712869

DOI:10.1145/3698038

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Alibaba Group
Hong Kong RGC

Conference

SoCC '24

Sponsor:

SoCC '24: ACM Symposium on Cloud Computing

November 20 - 22, 2024

WA, Redmond, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
177
Total Downloads

Downloads (Last 12 months)177
Downloads (Last 6 weeks)41

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten