skip to main content
10.1145/2578948.2560689acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
tutorial

Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures

Published: 07 February 2014 Publication History

Abstract

Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern-based programming models provide structure which can assist in the creation of autotuners for such problems. We present a machine learning based auto-tuning framework which partitions the work created by applications which follow the wavefront pattern across systems comprising multicore CPUs and multiple GPU accelerators. The use of a pattern facilitates training on synthetically generated instances. Exhaustive search space exploration on real applications indicates that correct setting of the tuning factors leads to a maximum of 20x speedup over an optimized sequential baseline, with an average of 7.8x. Our machine learned heuristics obtain 98% of this speed-up, averaged across range of applications and architectures.

References

[1]
A. M. Aji and W. Feng. Accelerating Data-Serial Applications on Data-Parallel GPGPUs: A Systems Approach. TR-08-24, Computer Science, Virginia Tech, 2008.
[2]
C. Alves, E. Cáceres, F. Dehne, and S. Song. A parallel wavefront algorithm for efficient biological sequence comparison. ICCSA '03, pages 249--258, 2003, Springer-Verlag.
[3]
S. Mohanty and M. Cole. Autotuning wavefront abstractions for heterogeneous architectures. In Third Workshop on Applications for Multi-Core Architectures, 2012, WAMCA '03 pages 42--47, New York, NY, USA, 2012. IEEE.
[4]
J. Anvik, S. Macdonald, D. Szafron, J. Schaeffer, S. Bromling, and K. Tan. Generating parallel programs from the wavefront design pattern. In 7th International Workshop on High-Level Parallel Programming Models and Supportive Environments, pages 1--8. Society Press, 2002.
[5]
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. CACM, 52(10):56--67, 2009.
[6]
The Wavefront pattern. www.cs.uiuc.edu/homes/snir/PPP/patterns/wavefront.pdf.
[7]
S. D. Hammond, G. R. Mudalige, J. A. Smith, and S. A. Jarvis. Performance prediction and procurement in practice: Assessing the suitability of commodity cluster components for wavefront codes. IET SOFTW., 3(6):509--521, 2009.
[8]
J. K. Hollingsworth and P. J. Keleher. Prediction and adaptation in active harmony. Cluster Computing, 2:195--205, July 1999.
[9]
E. F. Ian H. Witten. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
[10]
S. Kamil, C. Chan, S. Williams, L. Oliker, J. Shalf, M. Howison, and E. W. Bethel. A generalized framework for auto-tuning stencil computations. In Proceedings of the Cray User Group Conference, 2009.
[11]
D. Grewe, Z. Wang, and M. O'Boyle. Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems. In Proceedings of the 11th International Symposium on Code Generation and Optimization, CGO'13, 2013.
[12]
M. McCool, J. Reinders, and A. Robison. Structured Parallel Programming: Patterns for Efficient Computation. Morgan Kaufmann, 2012.
[13]
G. Rivera and C.-W. Tseng. Tiling optimizations for 3d scientific computations. In 2000 ACM/IEEE conference on Supercomputing, Supercomputing '00, Washington, DC, USA, 2000. IEEE Computer Society.
[14]
M. Christen, O. Schenk and H. Burkhart. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures, Proceedings of IPDPS 2011, pages 676--687, IEEE Press, 2011.
[15]
L. Yu, C. Moretti, A. Thrasher, S. Emrich, K. Judd, and D. Thain. Harnessing parallelism in multicore clusters with the all-pairs, wavefront, and makeflow abstractions. Cluster Computing, 13:243--256, September 2010.
[16]
Y. Zhang and F. Müller. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. Proceedings of the Tenth International Symposium on Code Generation and Optimization, pages 155--164, ACM Press, 2012.
[17]
M. Boratto, P. Alonso, D. Giménez, M. Barreto, and K. Oliveira. Auto-tuning methodology to represent landform attributes on multicore and multi-gpu systems. In Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '13, pages 125--132, New York, NY, USA, 2013. ACM.
[18]
T. Lutz, C. Fensch, and M. Cole. Partans: An autotuning framework for stencil computation on multi-gpu systems. ACM Trans. Archit. Code Optim., 9(4):59:1--59:24, Jan. 2013.
[19]
The Knapsack Problem - an Introduction to Dynamic Programming http://www.cs.rit.edu/zjb/courses/800/lec7.pdf

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores
February 2014
156 pages
ISBN:9781450326575
DOI:10.1145/2578948
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. auto-tuning
  2. multi-GPU
  3. wavefront pattern

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

PPoPP '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media