skip to main content
10.1145/3126908.3126969acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Performance modeling under resource constraints using deep transfer learning

Published: 12 November 2017 Publication History


Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.


Allison H. Baker, Robert D. Falgout, Tzanio V. Kolev, and Ulrike Meier Yang. 2011. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing 33 (2011), 2864--2887. Issue 5.
Allison H. Baker, Elizabeth R. Jessup, and Thomas Manteuffel. 2006. A Technique for Accelerating the Convergence of Restarted GMRES. SIAM J. Matrix Anal. Appl. 26 (2006), 962--984. Issue 4.
Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 1--8.
Prasanna Balaprakash, Ananta Tiwari, Stefan M Wild, Laura Carrington, and Paul D Hovland. 2016. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In International Conference on High Performance Computing. Springer, 219--239.
J Bergstra, N Pinto, and D Cox. 2012. Machine learning for predictive auto-tuning with boosted regression trees. In Proceedings of Innovative Parallel Computing. 1--9.
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123.
Jiahong K Chen, Ray-Bing Chen, Akihiro Fujii, Reiji Suda, and Weichung Wang. 2017. Surrogate-Assisted Tuning for Computer Experiments with Qualitative and Quantitative Parameters. (2017).
Edmond Chow. 2001. Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns. International Journal of High Performance Computing Applications 15 (2001), 56--74. Issue 1.
Edmond Chow. 2003. An unstructured multigrid method based on geometric smoothness. Numerical Linear Algebra With Applications 10 (2003), 401--421.
M. Curtis-Maury, A. Shah, F. Blagojevic, D.S. Nikolopoulos, B.R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In International Conference on Parallel Architectures and Compilation Techniques.
Hans De Sterck, Ulrike Meier Yang, and Jeffrey J. Heys. 2006. Reducing Complexity in Parallel Algebraic Multigrid Preconditioners. SIAM J. Matrix Anal. Appl. 27 (2006), 1019--1039. Issue 4.
Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15). 379--390.
U.S. D.O.E. 2016. Exascale Initiative. (2016).
Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Federico Ardanaz, Brad Geltz, Asma Al-Rawi, Fuat Keceli, and Kelly and Livingston. 2016. Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration Toward Co-Designed Energy Management Solutions. In 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2016. 43--53.
Thomas L Falch and Anne C Elster. 2017. Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications. Concurrency and Computation: Practice and Experience 29, 8 (2017).
Robert D. Falgout and Ulrike Meier Yang. 2002. HYPRE: A Library of High Performance Preconditioners. In Computational Science-ICCS 2002. Springer, 632--641.
Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, 179--191.
Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance prediction of multigrid-solver configurations. In Software for Exascale Computing. Springer, 69--88.
Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41 (2002), 155--177. Issue 1.
Intel. 2011. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. (December 2011).
AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).
Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In International Supercomputing Conference.
Aniruddha Marathe, Hormozd Gahvari, Jae-Seung Yeom, and Abhinav Bhatele. 2016. LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 1132--1141.
Saurav Muralidharan, Manu Shantharam, Mary Hall, Michael Garland, and Bryan Catanzaro. 2014. Nitro: A Framework for Adaptive Code Variant Tuning. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 501--512.
William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 146--160.
Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 173--182.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
James Price and Simon McIntosh-Smith. 2015. Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on. IEEE, 211--218.
Barry Rountree, David K. Lowenthal, Bronis de Supinski, Martin Schulz, and Vincent W. Freeh. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In International Conference on Supercomputing. Yorktown Heights, N.Y., USA.
Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting performance portability in search algorithms for autotuning. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1535--1544.
Yousef Saad. 1993. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM Journal on Scientific Computing 14 (1993), 461--469. Issue 2.
Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In Supercomputing.
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, 545--559.

Cited By

View all



Information & Contributors


Published In

cover image ACM Conferences
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
  • General Chair:
  • Bernd Mohr,
  • Program Chair:
  • Padma Raghavan
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.





Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017


Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. parameter selection
  3. performance prediction
  4. transfer learning


  • Research-article


SC '17

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)11
Reflects downloads up to 25 Dec 2024

Other Metrics


Cited By

View all

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media