research-article

Performance modeling under resource constraints using deep transfer learning

Authors:

Aniruddha Marathe,

Rushil Anirudh,

Abhinav Bhatele,

Jayaraman Thiagarajan,

Bhavya Kailkhura,

Jae-Seung Yeom,

Barry Rountree,

Todd GamblinAuthors Info & Claims

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 31, Pages 1 - 12

https://doi.org/10.1145/3126908.3126969

Published: 12 November 2017 Publication History

Abstract

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.

References

[1]

Allison H. Baker, Robert D. Falgout, Tzanio V. Kolev, and Ulrike Meier Yang. 2011. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing 33 (2011), 2864--2887. Issue 5.

Digital Library

[2]

Allison H. Baker, Elizabeth R. Jessup, and Thomas Manteuffel. 2006. A Technique for Accelerating the Convergence of Restarted GMRES. SIAM J. Matrix Anal. Appl. 26 (2006), 962--984. Issue 4.

Digital Library

[3]

Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 1--8.

[4]

Prasanna Balaprakash, Ananta Tiwari, Stefan M Wild, Laura Carrington, and Paul D Hovland. 2016. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In International Conference on High Performance Computing. Springer, 219--239.

[5]

J Bergstra, N Pinto, and D Cox. 2012. Machine learning for predictive auto-tuning with boosted regression trees. In Proceedings of Innovative Parallel Computing. 1--9.

[6]

James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123.

Digital Library

[7]

Jiahong K Chen, Ray-Bing Chen, Akihiro Fujii, Reiji Suda, and Weichung Wang. 2017. Surrogate-Assisted Tuning for Computer Experiments with Qualitative and Quantitative Parameters. (2017).

[8]

Edmond Chow. 2001. Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns. International Journal of High Performance Computing Applications 15 (2001), 56--74. Issue 1.

Digital Library

[9]

Edmond Chow. 2003. An unstructured multigrid method based on geometric smoothness. Numerical Linear Algebra With Applications 10 (2003), 401--421.

[10]

M. Curtis-Maury, A. Shah, F. Blagojevic, D.S. Nikolopoulos, B.R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In International Conference on Parallel Architectures and Compilation Techniques.

Digital Library

[11]

Hans De Sterck, Ulrike Meier Yang, and Jeffrey J. Heys. 2006. Reducing Complexity in Parallel Algebraic Multigrid Preconditioners. SIAM J. Matrix Anal. Appl. 27 (2006), 1019--1039. Issue 4.

Digital Library

[12]

Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15). 379--390.

Digital Library

[13]

U.S. D.O.E. 2016. Exascale Initiative. http://www.exascaleinitiative.org/pathforward. (2016).

[14]

Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Federico Ardanaz, Brad Geltz, Asma Al-Rawi, Fuat Keceli, and Kelly and Livingston. 2016. Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration Toward Co-Designed Energy Management Solutions. In 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2016. 43--53.

[15]

Thomas L Falch and Anne C Elster. 2017. Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications. Concurrency and Computation: Practice and Experience 29, 8 (2017).

[16]

Robert D. Falgout and Ulrike Meier Yang. 2002. HYPRE: A Library of High Performance Preconditioners. In Computational Science-ICCS 2002. Springer, 632--641.

Digital Library

[17]

Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, 179--191.

Digital Library

[18]

Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance prediction of multigrid-solver configurations. In Software for Exascale Computing. Springer, 69--88.

[19]

Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41 (2002), 155--177. Issue 1.

Digital Library

[20]

Intel. 2011. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. (December 2011).

[21]

AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).

[22]

Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In International Supercomputing Conference.

[23]

Aniruddha Marathe, Hormozd Gahvari, Jae-Seung Yeom, and Abhinav Bhatele. 2016. LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 1132--1141.

[24]

Saurav Muralidharan, Manu Shantharam, Mary Hall, Michael Garland, and Bryan Catanzaro. 2014. Nitro: A Framework for Adaptive Code Variant Tuning. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 501--512.

Digital Library

[25]

William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 146--160.

[26]

Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 173--182.

Digital Library

[27]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[28]

James Price and Simon McIntosh-Smith. 2015. Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on. IEEE, 211--218.

Digital Library

[29]

Barry Rountree, David K. Lowenthal, Bronis de Supinski, Martin Schulz, and Vincent W. Freeh. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In International Conference on Supercomputing. Yorktown Heights, N.Y., USA.

Digital Library

[30]

Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting performance portability in search algorithms for autotuning. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1535--1544.

[31]

Yousef Saad. 1993. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM Journal on Scientific Computing 14 (1993), 461--469. Issue 2.

Digital Library

[32]

Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In Supercomputing.

Digital Library

[33]

Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, 545--559.

Digital Library

Cited By

Li LFlynn THoisie A(2024)Learning Generalizable Program and Architecture Representations for Performance ModelingProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00072(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00072
Dey ADhakal AIslam TYeom JPatki TNichols DMovsesyan ABhatele A(2024)Relative Performance Prediction Using Few-Shot Learning2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00278(1764-1769)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00278
Banday BIslam TMarathe A(2024)PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00035(188-197)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00035
Show More Cited By

Index Terms

Performance modeling under resource constraints using deep transfer learning

Recommendations

Deep Configuration Performance Learning: A Systematic Survey and Taxonomy
Performance is arguably the most crucial attribute that reflects the quality of a configurable software system. However, given the increasing scale and complexity of modern software, modeling and predicting how various configurations can impact ...
Deep learning: systematic review, models, challenges, and research directions
Abstract
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...
Transfer learning-based deep CNN model for multiple faults detection in SCIM
Abstract
Deep learning-based fault detection approach for squirrel cage induction motors (SCIMs) fault detection can provide a reliable solution to the industries. This paper encapsulates the idea of transfer learning-based knowledge transfer approach and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2017

801 pages

ISBN:9781450351140

DOI:10.1145/3126908

General Chair:
Bernd Mohr
Jülich Supercomputing Center, Jülich, Germany
,
Program Chair:
Padma Raghavan
Vanderbilt University, Nashville, TN

Copyright © 2017 ACM.

© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '17

Sponsor:

SIGHPC

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2017

Colorado, Denver

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
741
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)11

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li LFlynn THoisie A(2024)Learning Generalizable Program and Architecture Representations for Performance ModelingProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00072(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00072
Dey ADhakal AIslam TYeom JPatki TNichols DMovsesyan ABhatele A(2024)Relative Performance Prediction Using Few-Shot Learning2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00278(1764-1769)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00278
Banday BIslam TMarathe A(2024)PERFGEN: A Synthesis and Evaluation Framework for Performance Data using Generative AI2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00035(188-197)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00035
Ayana GDese KAbagaro AJeong KYoon SChoe S(2024)Multistage transfer learning for medical imagesArtificial Intelligence Review10.1007/s10462-024-10855-757:9Online publication date: 6-Aug-2024
https://doi.org/10.1007/s10462-024-10855-7
Besnard JTarraf ACascajo AShende S(2024)Introducing the Metric Proxy for Holistic I/O MeasurementsHigh Performance Computing. ISC High Performance 2024 International Workshops10.1007/978-3-031-73716-9_15(213-226)Online publication date: 14-Dec-2024
https://doi.org/10.1007/978-3-031-73716-9_15
Gong JChen TChandra SBlincoe KTonella P(2023)Predicting Software Performance with Divide-and-LearnProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616334(858-870)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616334
Hutter ESolomonik EMohror KArnold DBadia R(2023)Application Performance Modeling via Tensor CompletionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607069(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607069
Randall TKoo JVideau BKruse MWu XHovland PHall MGe RBalaprakash PGallivan KNikolopoulos DBeivide RGallopoulos E(2023)Transfer-learning-based Autotuning using Gaussian CopulaProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593712(37-49)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593712
Qi XChen JDeng L(2023)CP$$^{3}$$: Hierarchical Cross-Platform Power/Performance Prediction Using a Transfer Learning ApproachAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_7(117-138)Online publication date: 11-Jan-2023
https://doi.org/10.1007/978-3-031-22677-9_7
Dorier MEgele RBalaprakash PKoo JMadireddy SRamesh SMalony ARoss R(2022)HPC Storage Service Autotuning Using Variational- Autoencoder -Guided Asynchronous Bayesian Optimization2022 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER51413.2022.00049(381-393)Online publication date: Sep-2022
https://doi.org/10.1109/CLUSTER51413.2022.00049
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents