skip to main content
10.1145/3615979.3656055acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
short-paper
Open access

Surrogate Modeling for HPC Application Iteration Times Forecasting with Network Features

Published: 24 June 2024 Publication History

Abstract

Interconnect networks are the foundation for modern high performance computing (HPC) systems. Parallel discrete event simulation (PDES), serving as a cornerstone in the study of large-scale networking systems by modeling and simulating the real-world behaviors of HPC facilities, faces escalating computational complexities at an unsustainable scale. The research community is interested in building a surrogate-ready PDES framework where an accurate surrogate model can be used to forecast HPC behaviors and replace computationally expensive PDES phases. In this paper, we focus on forecasting application iteration times, the key indicator of large-scale networking performance, with network features, such as bandwidth-consumed and busy time on routers. We introduce five representative methods, including LAST, Average, ARIMA, LSTM, and the proposed framework LSTM-Feat, to forecast the iteration times of an exemplar application MILC running on a dragonfly system. By incorporating network features, LSTM-Feat can understand dependencies between network features and iteration times, thus facilitating forecasts. The experiments demonstrate the effectiveness of incorporating network features into surrogate models and the potential of surrogate models to accelerate PDES.

References

[1]
Hossein Abbasimehr and Reza Paki. 2022. Improving time series forecasting using LSTM and attention models. Journal of Ambient Intelligence and Humanized Computing 13, 1 (2022), 673–691.
[2]
Md Atik Ahamed and Qiang Cheng. 2024. TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting. arXiv preprint arXiv:2403.09898 (2024).
[3]
Mohammed S Ahmed and Allen R Cook. 1979. Analysis of freeway traffic time-series data by using Box-Jenkins techniques. Number 722.
[4]
Christopher D Carothers, David Bauer, and Shawn Pearce. 2002. ROSS: A high-performance, low-memory, modular Time Warp system. J. Parallel and Distrib. Comput. 62, 11 (2002), 1648–1669.
[5]
Manoel Castro-Neto, Young-Seon Jeong, Myong-Kee Jeong, and Lee D Han. 2009. Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert systems with applications 36, 3 (2009), 6164–6173.
[6]
Di Chai, Leye Wang, and Qiang Yang. 2018. Bike flow prediction with multi-graph convolutional networks. In Proceedings of the 26th ACM SIGSPATIAL international conference on advances in geographic information systems. 397–400.
[7]
Xingyi Cheng, Ruiqing Zhang, Jie Zhou, and Wei Xu. 2018. Deeptransport: Learning spatial-temporal dependency for traffic condition forecasting. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[8]
Elkin Cruz-Camacho, Kevin A Brown, Xin Wang, Xiongxiao Xu, Kai Shu, Zhiling Lan, Robert B Ross, and Christopher D Carothers. 2023. Hybrid PDES Simulation of HPC Networks Using Zombie Packets. In Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Orlando, FL, USA)(SIGSIM-PADS’23). Association for Computing Machinery, New York, NY, USA.
[9]
Jan G De Gooijer and Rob J Hyndman. 2006. 25 years of time series forecasting. International journal of forecasting 22, 3 (2006), 443–473.
[10]
Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
[11]
Yu Gu, Yong Liu, and Don Towsley. 2004. On integrating fluid models with packet simulation. In IEEE INFOCOM 2004, Vol. 4. IEEE, 2856–2866.
[12]
Qi He, Mostafa Ammar, George Riley, and Richard Fujimoto. 2002. Exploiting the predictability of TCP’s steady-state behavior to speed up network simulation. In Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems. IEEE, 101–108.
[13]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[14]
Xuexiang Jin, Yi Zhang, and Danya Yao. 2007. Simultaneously prediction of network traffic flow based on PCA-SVR. In Advances in Neural Networks–ISNN 2007: 4th International Symposium on Neural Networks, ISNN 2007, Nanjing, China, June 3-7, 2007, Proceedings, Part II 4. Springer, 1022–1031.
[15]
Yao Kang, Xin Wang, and Zhiling Lan. 2022. Study of Workload Interference with Intelligent Routing on Dragonfly. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas) (SC ’22). Article 20, 14 pages.
[16]
John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. ACM SIGARCH Computer Architecture News 36, 3 (2008), 77–88.
[17]
Patrick Lavin, Jeffrey Young, and Richard Vuduc. [n.d.]. Multifidelity Memory System Simulation in SST. ([n. d.]).
[18]
Moshe Levin and Yen-Der Tsao. 1980. On forecasting freeway occupancies and volumes (abridgment). Transportation Research Record773 (1980).
[19]
Jason Liu. 2007. Parallel simulation of hybrid network traffic models. In 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS’07). IEEE, 141–151.
[20]
Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2023. itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625 (2023).
[21]
Xiaolei Ma, Zhimin Tao, Yinhai Wang, Haiyang Yu, and Yunpeng Wang. 2015. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C: Emerging Technologies 54 (2015), 187–197.
[22]
Rishabh Madan and Partha Sarathi Mangipudi. 2018. Predicting computer network traffic: a time series forecasting approach using DWT, ARIMA and RNN. In 2018 Eleventh International Conference on Contemporary Computing (IC3). IEEE, 1–5.
[23]
Misbah Mubarak, Christopher D Carothers, Robert B Ross, and Philip Carns. 2017. Enabling parallel simulation of large-scale HPC network systems. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 87–100.
[24]
Kalyan Perumalla, Maximilian Bremer, Kevin Brown, Cy Chan, Stephan Eidenbenz, K Scott Hemmert, Adolfy Hoisie, Benjamin Newton, James Nutaro, Tomas Oppelstrup, 2022. Computer Science Research Needs for Parallel Discrete Event Simulation (PDES). Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
[25]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[26]
Xin Wang, Misbah Mubarak, Yao Kang, Robert B Ross, and Zhiling Lan. 2020. Union: An automatic workload manager for accelerating network simulation. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 821–830.
[27]
Xin Wang, Misbah Mubarak, Xu Yang, Robert B Ross, and Zhiling Lan. 2018. Trade-off study of localizing communication and balancing network traffic on a dragonfly system. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1113–1122.
[28]
Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. 2022. Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 (2022).
[29]
Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. 2018. A hybrid deep learning based traffic flow prediction method and its understanding. Transportation Research Part C: Emerging Technologies 90 (2018), 166–180.
[30]
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 753–763.
[31]
Xiongxiao Xu. 2023. Exploring Machine Learning Models with Spatial-Temporal Information for Interconnect Network Traffic Forecasting. In Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 56–57.
[32]
Xiongxiao Xu, Yueqing Liang, Baixiang Huang, Zhiling Lan, and Kai Shu. 2024. Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting. arXiv preprint arXiv:2404.14757 (2024).
[33]
Xiongxiao Xu, Xin Wang, Elkin Cruz-Camacho, Christopher D. Carothers, Kevin A. Brown, Robert B. Ross, Zhiling Lan, and Kai Shu. 2023. Machine Learning for Interconnect Network Traffic Forecasting: Investigation and Exploitation. In Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 133–137.
[34]
Huaxiu Yao, Xianfeng Tang, Hua Wei, Guanjie Zheng, and Zhenhui Li. 2019. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5668–5675.
[35]
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2022. Are transformers effective for time series forecasting?arXiv preprint arXiv:2205.13504 (2022).
[36]
Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Thirty-first AAAI conference on artificial intelligence.
[37]
Jiawei Zhu, Xing Han, Hanhan Deng, Chao Tao, Ling Zhao, Pu Wang, Tao Lin, and Haifeng Li. 2022. KST-GCN: A knowledge-driven spatial-temporal graph convolutional network for traffic forecasting. IEEE Transactions on Intelligent Transportation Systems 23, 9 (2022), 15055–15065.

Index Terms

  1. Surrogate Modeling for HPC Application Iteration Times Forecasting with Network Features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGSIM-PADS '24: Proceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
      June 2024
      155 pages
      ISBN:9798400703638
      DOI:10.1145/3615979
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 June 2024

      Check for updates

      Author Tags

      1. HPC
      2. Machine Learning
      3. Parallel Discrete Event Simulation

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Conference

      SIGSIM-PADS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 398 of 779 submissions, 51%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 126
        Total Downloads
      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)26
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media