skip to main content
10.1145/3573900.3591122acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
short-paper

Hybrid PDES Simulation of HPC Networks Using Zombie Packets

Published: 21 June 2023 Publication History

Abstract

High-fidelity network simulations provide insights into new realms for high-performance computing (HPC) architectures, although at a high cost. Surrogate models offer a significant reduction in runtime, yet they cannot serve as complete replacements and should be only used when appropriate. Thus the need for hybrid modeling, where high-fidelity simulation and surrogates run side-by-side. We present a surrogate model for HPC networks in which packets bypass the network, and the network state itself is suspended when switching to the surrogate. To bypass the network, every packet is scheduled to arrive at a predicted time in the future estimated from historical data; to suspend the network, all in-flight packets are delivered to their destinations, but they are kept in the system to awaken as zombies when switching back to high-fidelity. Speedup for a hybrid model is relative to the proportion of surrogate to high-fidelity. We obtained a 3 × speedup for a simulation where 70% of virtual time was spent in surrogate mode. When considering the surrogate portion only, the speedup jumps to nearly 20 × on a uniform random network traffic example. The accuracy of the overall simulation increased when the network state was suspended instead of ignored, which demonstrates the need for modeling the network state when transitioning from surrogate back to high-fidelity mode.

References

[1]
P. D. Barnes, C. D. Carothers, D. R. Jefferson, and J. M. LaPre. 2013. Warp speed: Executing Time Warp on 1,966,080 cores. In Proc. of the 2013 ACM SIGSIM Conf. on Principles of Advanced Discrete Simulation (PADS). 327–336.
[2]
Kevin A. Brown, Neil McGlohon, Sudheer Chunduri, Eric Borch, Robert B. Ross, Christopher D. Carothers, and Kevin Harms. 2021. A Tunable Implementation of Quality-of-Service Classes for HPC Networks. In ISC High Performance 2021: High Performance Computing (ISC) ((Virtual) Frankfurt, Germany).
[3]
Center for Computational Innovations. [n. d.]. Artificial intelligence multiprocessing optimized system (AiMOS). cci.rpi.edu. https://cci.rpi.edu/aimos (accessed April, @Misccci, author = Center for Computational Innovations, howpublished = cci.rpi.edu, note = https://cci.rpi.edu/aimos (accessed Feb 1, 2022), title = Artificial intelligence multiprocessing optimized system (AiMOS), 2023).
[4]
Samir R. Das and Richard M. Fujimoto. 1993. A Performance Study of the Cancelback Protocol for Time Warp. In Proceedings of the Seventh Workshop on Parallel and Distributed Simulation. ACM, San Diego California USA, 135–142. https://doi.org/10.1145/158459.158476
[5]
Yu Gu, Yong Liu, and D. Towsley. 2004. On integrating fluid models with packet simulation. In IEEE INFOCOM 2004, Vol. 4. 2856–2866 vol.4. https://doi.org/10.1109/INFCOM.2004.1354702
[6]
Q. He, M. Ammar, G. Riley, and R. Fujimoto. 2002. Exploiting the predictability of TCP’s steady-state behavior to speed up network simulation. In Proceedings. 10th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems. 101–108. https://doi.org/10.1109/MASCOT.2002.1167066
[7]
Yao Kang, Xin Wang, and Zhiling Lan. 2022. Study of Workload Interference with Intelligent Routing on Dragonfly. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas) (SC ’22). Article 20, 14 pages.
[8]
Yao Kang, Xin Wang, Neil McGlohon, Misbah Mubarak, Sudheer Chunduri, and Zhiling Lan. 2019. Modeling and Analysis of Application Interference on Dragonfly+. In SIGSIM PADS.
[9]
Jason Liu. 2007. Parallel Simulation of Hybrid Network Traffic Models. In 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS’07). 141–151. https://doi.org/10.1109/PADS.2007.26
[10]
Neil McGlohon and Christopher D. Carothers. 2021. Toward Unbiased Deterministic Total Ordering of Parallel Simulations with Simultaneous Events. In Proceedings of the Winter Simulation Conference (Phoenix, Arizona) (WSC ’21). IEEE Press.
[11]
Neil McGlohon, Christopher D. Carothers, K. Scott Hemmert, Michael Levenhagen, Kevin A. Brown, Sudheer Chunduri, and Robert B. Ross. 2021. Exploration of Congestion Control Techniques on Dragonfly-class HPC Networks Through Simulation. In 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 40–50. https://doi.org/10.1109/PMBS54543.2021.00010
[12]
Misbah Mubarak, Christopher D Carothers, Robert B Ross, and Philip H Carns. 2017. Enabling Parallel Simulation of Large-Scale HPC Network Systems.IEEE Trans. Parallel Distrib. Syst. 28, 1 (2017), 87–100.
[13]
Sudhir Srinivasan and Paul F. Reynolds. 1998. Elastic Time. ACM Trans. Model. Comput. Simul. 8, 2 (April 1998), 103–139. https://doi.org/10.1145/280265.280267
[14]
Noah Wolfe, Misbah Mubarak, Nikhil Jain, Jens Domke, Abhinav Bhatele, Christopher D. Carothers, and Robert B. Ross. 2017. Preliminary Performance Analysis of Multi-Rail Fat-Tree Networks. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Madrid, Spain) (CCGrid ’17). IEEE Press, 258–261. https://doi.org/10.1109/CCGRID.2017.102

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
June 2023
173 pages
ISBN:9798400700309
DOI:10.1145/3573900
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. HPC networks
  2. Parallel Discrete-event Simulation
  3. Surrogate simulation

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

SIGSIM-PADS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 779 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media