skip to main content
10.1145/3458817.3476197acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications

Published: 13 November 2021 Publication History


In an in-situ workflow, multiple components such as simulation and analysis applications are coupled with streaming data transfers. The multiplicity of possible configurations necessitates an auto-tuner for workflow optimization. Existing auto-tuning approaches are computationally expensive because many configurations must be sampled by running the whole workflow repeatedly in order to train the auto-tuner surrogate model or otherwise explore the configuration space. To reduce these costs, we instead combine the performance models of component applications by exploiting the analytical workflow structure, selectively generating test configurations to measure and guide the training of a machine learning workflow surrogate model. Because the training can focus on well-performing configurations, the resulting surrogate model can achieve high prediction accuracy for good configurations despite training with fewer total configurations. Experiments with real applications demonstrate that our approach can identify significantly better configurations than other approaches for a fixed computer time budget.

Supplementary Material

MP4 File (Bootstrapping In-Situ Workflow Auto-Tuning via Combining Performance Models of Component Applications.mp4.mp4)
Presentation video


Timothy G. Armstrong, Justin M. Wozniak, Michael Wilde, and Ian T. Foster. 2014. Compiler Techniques for Massively Scalable Implicit Task Parallelism. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 299--310.
Utkarsh Ayachit, et al. 2016. Performance Analysis, Design Considerations, and Applications of Extreme-scale in situ Infrastructures. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).
Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in High-performance Computing Applications. Proc. IEEE 106, 11 (2018), 2068--2083.
Prasanna Balaprakash, Robert B. Gramacy, and Stefan M. Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In IEEE Cluster.
Babak Behzad, Surendra Byna, Prabhat, and Marc Snir. 2019. Optimizing I/O Performance of HPC Applications with Autotuning. ACM Trans. on Parallel Computing (TOPC) 5, 4 (2019), 15:1--15:27.
Alexandra Calotoiu, Marcin Copik, Torsten Hoefler, Marcus Ritter, Sergei Shudler, and Felix Wolf. 2020. ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications. In Spring Software for Exascale Computing. 453--482.
Alexandra Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--12.
Zhen Cao, Vasily Tarasov, Sachin Tiwari, and Erez Zadok. 2018. Towards better understanding of black-box auto-tuning: A comparative analysis for storage systems. In USENIX Annual Technical Conference (ATC). 893--907.
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). 785--794.
Jaemin Choi, David F. Richards, Laxmikant V. Kale, and Abhinav Bhatele. 2020. End-to-end Performance Modeling of Distributed GPU Applications. In ACM International Conference on Supercomputing (ICS). 30:1--12.
Jai Dayal, Drew Bratcher, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Xuechen Zhang, Hasan Abbasi, Scott Klasky, and Norbert Podhorszki. 2014. Flexpath: Type-based publish/subscribe system for large-scale science analytics. In IEEE/ACM intl. Symp. on Cluster, Cloud, and Internet Computing (CCGrid). 246--255.
Diego Didona, Francesco Quaglia, Paolo Romano, and Ennio Torre. 2015. Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning. In ACM International Conference on Performance Engineering (ICPE). 145--156.
Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows. Cluster Computing 15, 2 (2012), 163--181.
Mathieu Doucet et al. 2021. Machine learning for neutron scattering at ORNL. Machine Learning: Science and Technology 2, 2 (jan 2021), 023001.
Matthieu Dreher and Tom Peterka. 2017. Decaf: Decoupled dataflows for in situ high-performance workflows. Technical Report ANL/MCS-TM-371. ANL.
Shaohua Duan, Pradeep Subedi, Philip E. Davis, and Manish Parashar. 2019. Addressing Data Resiliency for Staging Based Scientific Workflows. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 87:1--22.
Dmitry Duplyakin, Jed Brown, and Robert Ricci. 2016. Active learning in performance analysis. In IEEE Cluster. Taipei, Taiwan, 182--191.
Ian Foster, Mark Ainsworth, Julie Bessac, Franck Cappello, Jong Choi, Sheng Di, Zichao Di, Ali M Gok, Hanqi Guo, Kevin A Huck, Christopher Kelly, Scott Klasky, Kerstin Kleese van Dam, Xin Liang, Kshitij Mehta, Manish Parashar, Tom Peterka, Line Pouchard, Tong Shu, Ozan Tugluk, Hubertus van Dam, Lipeng Wan, Matthew Wolf, Justin M. Wozniak, Wei Xu, Igor Yakushin, Shinjae Yoo, and Todd Munson. 2021. Online Data Analysis and Reduction: An Important Co-design Motif for Extreme-scale Computers. International Journal of High Performance Computing Applications (IJHPCA) (2021).
Geoffrey Fox, Shantenu Jha, and Lavanya Ramakrishnan. 2015. Streaming and Steering Applications: Requirements and Infrastructure final report.
Yuankun Fu, Feng Li, Fengguang Song, and Zizhong Chen. 2018. Performance Analysis and Optimization of In-situ Integration of Simulation with Data Analysis: Zipping Applications Up. In ACM Intl. Symp. on High-Performance Parallel and Distributed Computing (HPDC). 192--205.
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and David Sculley. 2017. Google Vizier: A service for black-box optimization. In ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data (KDD). 1487--1496.
Heat Transfer. 2019.
Kate Keahey and James Ahrens. 2017. Future Online Analysis Platform workshop report.
LAMMPS. 2021.
Matthew Larsen, Cyrus Harrison, James Kress, David Pugmire, Jeremy S. Meredith, and Hank Childs. 2016. Performance modeling of in situ rendering. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).
Qing Liu, et al. 2014. Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7 (2014), 1453--1473.
Preeti Malakar, Venkatram Vishwanath, Todd Munson, Christopher Knight, Mark Hereld, Sven Leyffer, and Michael E. Papka. 2015. Optimal Scheduling of In-situ Analysis for Large-scale Scientific Simulations. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). Austin, TX, USA.
Azamat Mametjanov, Prasanna Balaprakash, Chekuri Choudary, Paul D. Hovland, Stefan M. Wild, and Gerald Sabin. 2015. Autotuning FPGA Design Parameters for Performance and Power. In IEEE Intl. Symp. on Field-Programmable Custom Computing Machines. 84--91.
Aniruddha Marathe, Rushil Anirudh, Nikhil Jain, Abhinav Bhatele, Jayaraman Thiagarajan, Bhavya Kailkhura, Jae-Seung Yeom, Barry Rountree, and Todd Gamblin. 2017. Performance modeling under resource constraints using deep transfer learning. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). 1--12.
Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A pattern based algorithmic autotuner for graph processing on GPUs. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 201--213.
Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 831--840.
Ari Morcos, Haonan Yu, Michela Paganini, and Yuandong Tian. 2019. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. In ACM Intl. Conf. on Neural Information Processing Systems (NeurIPS). 1--11.
Jiandong Mu, Mengdi Wang, Lanbo Li, Jun Yang, Wei Lin, and Wei Zhang. 2020. A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU. In ACM/IEEE Design Automation Conference (DAC). 1--6.
William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO). 245--256.
Jonathan Ozik, Nicholson T. Collier, Justin M. Wozniak, Charles M. Macal, and Gary An. 2018. Extreme-Scale Dynamic Exploration of a Distributed Agent-Based Model with the EMEWS Framework. IEEE Transactions on Computational Social Systems 5, 3 (2018), 884--895.
Tom Peterka. 2019. ASCR Workshop on In Situ Data Management report.
Mihail Popov, Alexandra Jimborean, and David Black-Schaffer. 2019. Efficient thread/page/parallelism autotuning for NUMA systems. In ACM International Conference on Supercomputing (ICS). 342--353.
Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, and Felix Wolf. 2020. Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 884--895.
Tong Shu. 2017. Performance Optimization and Energy Efficiency of Big-data Computing Workflows. Dissertation. New Jersey Institute of Technology, Newark, NJ, USA.
Tong Shu, Yanfei Guo, Justin Wozniak, Xiaoning Ding, Ian Foster, and Tahsin Kurc. 2021. POSTER: In-situ Workflow Auto-tuning through Combining Component Models. In Proc. of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Virtual Event, 467--468.
Tong Shu and Chase Q. Wu. 2016. Energy-efficient Mapping of Big Data Workflows under Deadline Constraints. In Proc. of Workshop on Workflows in Support of Large-Scale Science in conjunction with ACM/IEEE Supercomputing Conference. Salt Lake City, UT, USA, 34--43.
Tong Shu and Chase Q. Wu. 2017. Energy-efficient Dynamic Scheduling of Deadline-constrained MapReduce Workflows. In Proc. of IEEE eScience. Auckland, New Zealand, 393--402.
Tong Shu and Chase Q. Wu. 2017. Performance Optimization of Hadoop Workflows in Public Clouds through Adaptive Task Partitioning. In Proc. of IEEE INFOCOM. Atlanta, GA, USA, 2349--2357.
Tong Shu and Chase Q. Wu. 2020. Energy-efficient Mapping of Large-scale Workflows under Deadline Constraints in Big Data Computing Systems. Future Generation Computer Systems (FGCS) 110 (2020), 515--530.
Mohammed Sourouri, Espen Birger Raknes, Nico Reissmann, Johannes Langguth, Daniel Hackenberg, Robert Schöne, and Per Gunnar Kjeldsberg. 2017. Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).
Rick Stevens, Jeffrey Nichols, and Katherine Yelick. 2020. AI for Science Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science.
Pradeep Subedi, Philip Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, and Manish Parashar. 2018. Stacker: An Autonomic Data Movement Engine for Extreme-Scale Data Staging-Based In-Situ Workflows. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).
Jingwei Sun, Guangzhong Sun, Shiyan Zhan, Jiepeng Zhang, and Yong Chen. 2020. Automated Performance Modeling of HPC Applications Using Machine Learning. IEEE Trans. on Computers (TC) 69, 5 (2020), 749--763.
Jayaraman J. Thiagarajan, Nikhil Jain, Rushil Anirudh, Alfredo Gimenez, Rahul Sridhar, Aniruddha Marathe, Tao Wang, Murali Emani, Abhinav Bhatele, and Todd Gamblin. 2018. Bootstrapping parameter space exploration for fast tuning. In ACM International Conference on Supercomputing (ICS). 385--395.
Philippe Tillet and David Cox. 2017. Input-aware auto-tuning of compute-bound HPC kernels. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).
Venkatram Vishwanath, Mark Hereld, Vitali Morozov, and Michael E. Papka. 2011. Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems. In IEEE/ACM Intl. Conf. on High Performance Computing, Networking, Storage and Analysis (SC).
Voro++. 2021.
Justin M. Wozniak, Philip Davis, Tong Shu, Jonathan Ozik, Nicholas Collier, Ian Foster, Thomas Brettin, and Rick Stevens. 2018. Scaling Deep Learning for Cancer with Advanced Workflow Storage Integration. In Proc. of the 4th Workshop on Machine Learning in HPC Environments in conjunction with ACM/IEEE Supercomputing Conference. Dallas, TX, USA, 114--123.
Justin M. Wozniak, Matthieu Dorier, Robert Ross, Tong Shu, Tahsin Kurc, Li Tang, Norbert Podhorszki, and Matthew Wolf. 2019. MPI Jobs within MPI Jobs: a Practical Way of Enabling Task-level Fault-tolerance in HPC Workflows. Future Generation Computer Systems (FGCS) 101 (2019), 576--589.
Yufei Xia, Chuanzhe Liu, Yuying, and Nana Liu. 2017. A Boosted Decision Tree Approach using Bayesian Hyper-parameter Optimization for Credit Scoring. Expert Systems with Applications 75 (2017), 225--241.
Zhibin Yu, Zhendong Bei, and Xuehai Qian. 2018. Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing. In ACM Intl. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 564--577.
Fan Zhang, Tong Jin, Qian Sun, Melissa Romanus, Hoang Bui, Scott Klasky, and Manish Parashar. 2017. In-memory staging and data-centric task placement for coupled scientific simulation workflows. Concurrency and Computation: Practice and Experience 29, 12 (2017), 1--19.
Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, and Hongfeng Yu. 2013. FlexIO: I/O middleware for location-flexible scientific data analytics. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 320--331.

Cited By

View all
  • (2023)Accuracy-Constrained Efficiency Optimization and GPU Profiling of CNN Inference for Detecting Drainage Crossing LocationsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624260(1780-1788)Online publication date: 12-Nov-2023
  • (2023)Pareto Optimization of CNN Models via Hardware-Aware Neural Architecture Search for Drainage Crossing Classification on Resource-Limited DevicesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624258(1767-1775)Online publication date: 12-Nov-2023
  • (2023)Toward Efficient Homomorphic Encryption for Outsourced Databases through Parallel CachingProceedings of the ACM on Management of Data10.1145/35889201:1(1-23)Online publication date: 30-May-2023
  • Show More Cited By

Index Terms

  1. Bootstrapping in-situ workflow auto-tuning via combining performance models of component applications



    Information & Contributors


    Published In

    cover image ACM Conferences
    SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2021
    1493 pages
    © 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.



    • IEEE CS


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2021


    Request permissions for this article.

    Check for updates


    Author Tags

    1. auto-tuning
    2. bootstrapping
    3. component model combination
    4. in-situ workflow


    • Research-article

    Funding Sources

    • U.S. Department of Energy Exascale Computing Project
    • Southern Illinois University Carbondale


    SC '21

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 24 Dec 2024

    Other Metrics


    Cited By

    View all
    • (2023)Accuracy-Constrained Efficiency Optimization and GPU Profiling of CNN Inference for Detecting Drainage Crossing LocationsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624260(1780-1788)Online publication date: 12-Nov-2023
    • (2023)Pareto Optimization of CNN Models via Hardware-Aware Neural Architecture Search for Drainage Crossing Classification on Resource-Limited DevicesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624258(1767-1775)Online publication date: 12-Nov-2023
    • (2023)Toward Efficient Homomorphic Encryption for Outsourced Databases through Parallel CachingProceedings of the ACM on Management of Data10.1145/35889201:1(1-23)Online publication date: 30-May-2023
    • (2023)TopoCommit: A Topological Commit Protocol for Cross-Ledger Transactions in Scientific Computing2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00038(365-375)Online publication date: 31-Oct-2023
    • (2023)Adaptive elasticity policies for staging-based in situ visualizationFuture Generation Computer Systems10.1016/j.future.2022.12.010142:C(75-89)Online publication date: 1-May-2023
    • (2022)Serving unseen deep learning models with near-optimal configurationsProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563485(461-476)Online publication date: 7-Nov-2022
    • (2021)Online data analysis and reduction: An important Co-design motif for extreme-scale computersThe International Journal of High Performance Computing Applications10.1177/10943420211023549(109434202110235)Online publication date: 12-Jun-2021

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.








    Share this Publication link

    Share on social media