research-article

Early evaluation of directive-based GPU programming models for productive exascale computing

Authors:

Jeffrey S. VetterAuthors Info & Claims

SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 23, Pages 1 - 11

Published: 10 November 2012 Publication History

Abstract

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. These directive-based models provide different levels of abstraction, and required different levels of programming effort to port and optimize applications. Understanding these differences among these new models provides valuable insights on their applicability and performance potential. In this paper, we evaluate existing directive-based models by porting thirteen application kernels from various scientific domains to use CUDA GPUs, which, in turn, allows us to identify important issues in the functionality, scalability, tunability, and debuggability of the existing models. Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.

References

[1]

J. S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, "Keeneland: Bringing heterogeneous gpu computing to the computational science community," IEEE Computing in Science and Engineering, vol. 13, no. 5, pp. 90--95, 2011.

Digital Library

[2]

J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J.-. Andre, D. Barkai, J.-. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, X. Chi, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, B. Gropp, RobertHarrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Y. Ishikawa, Z. Jin, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, T. Lippert, B. Lucas, B. Maccabe, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. Mueller, W. Nagel, H. Nakashima, M. E. Papka, D. Reed, M. Sato, E. Seidel, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, S. Sumimoto, W. Tang, J. Taylor, R. Thakur, A. Trefethen, M. Valero, A. van der Steen, J. Vetter, P. Williams, R. Wisniewski, and K. Yelick, "The International Exascale Software Project RoadMap," Journal of High Performance Computer Applications, vol. 25, no. 1, 2011.

Digital Library

[3]

S. Amarasinghe, M. Hall, R. Lethin, K. Pingali, D. Quinlan, V. Sarkar, J. Shalf, R. Lucas, K. Yelick, P. Balaji, P. C. Diniz, A. Koniges, M. Snir, and S. R. Sachs, "Report of the 2011 workshop on exascale programming challenges," US Department of Energy, Tech. Rep., 2011.

[4]

Sh, "Sh: A metaprogramming language for programmable GPUs. {online}. available: http://www.libsh.org," (accessed April 02, 2012).

[5]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," in SIGGRAPH '04: ACM SIGGRAPH 2004 Papers. New York, NY, USA: ACM, 2004, pp. 777--786.

Digital Library

[6]

S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh, "Data and computation transformations for brook streaming applications on multiprocessors," in CGO '06: Proceedings of the International Symposium on Code Generation and Optimization. Washington, DC, USA: IEEE Computer Society, 2006, pp. 196--207.

Digital Library

[7]

M. Peercy, M. Segal, and D. Gerstmann, "A performance-oriented data parallel virtual machine for GPUs," in SIGGRAPH '06: ACM SIGGRAPH 2006 Sketches. New York, NY, USA: ACM, 2006, p. 184.

Digital Library

[8]

CUDA, "NVIDIA CUDA {online}. available: http://developer.nvidia.com/category/zone/cuda-zone," 2012, (accessed April 02, 2012).

[9]

OpenCL, "OpenCL {Online}. Available: http://www.khronos.org/opencl/," 2012, (accessed April 02, 2012).

[10]

OpenMP, "OpenMP {Online}. Available: http://openmp.org/wp/," 2012, (accessed April 02, 2012).

[11]

T. D. Han and T. S. Abdelrahman, "hicuda: High-level gpgpu programming," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 78--90, 2011.

Digital Library

[12]

S. Lee and R. Eigenmann, "OpenMPC: Extended OpenMP programming and tuning for GPUs," in SC'10: Proceedings of the 2010 ACM/IEEE conference on Supercomputing. IEEE press, 2010.

Digital Library

[13]

PGI_Accelerator, "The Portland Group, PGI Fortran and C Accelarator Programming Model {Online}. Available: http://www.pgroup.com/resources/accel.htm," 2009, (accessed April 02, 2012).

[14]

HMPP, "HMPP Workbench, a directive-based compiler for hybrid computing {Online}. Available: www.caps-entreprise.com/hmpp.html," 2009, (accessed April 02, 2012).

[15]

A. Leung, N. Vasilache, B. Meister, M. Baskaran, D. Wohlford, C. Bastoul, and R. Lethin, "A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction," in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, ser. GPGPU '10. New York, NY, USA: ACM, 2010, pp. 51--61.

Digital Library

[16]

OpenACC, "OpenACC: Directives for Accelerators {Online}. Available: http://www.openacc-standard.org," 2011, (accessed April 02, 2012).

[17]

J. C. Beyer, E. J. Stotzer, A. Hart, and B. R. de Supinski, "OpenMP for Accelerators." in IWOMP'11, 2011, pp. 108--121.

Digital Library

[18]

O. Hernandez, W. Ding, B. Chapman, C. Kartsaklis, R. Sankaran, and R. Graham, "Experiences with High-Level Programming Directives for Porting Applications to GPUs," in Facing the Multicore - Challenge II. Springer Berlin Heidelberg, 2012, pp. 96--107.

Digital Library

[19]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. ha Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2009.

Digital Library

[20]

L. L. Pilla, "Hpcgpu Project {Online}. Available: http://hpcgpu.codeplex.com/," 2012, (accessed April 02, 2012).

[21]

S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: A compiler framework for automatic translation and optimization," in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). New York, NY, USA: ACM, Feb. 2009, pp. 101--110.

Digital Library

[22]

L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search," in Proceedings of the 47th Design Automation Conference, ser. DAC '10. New York, NY, USA: ACM, 2010, pp. 52--55.

Digital Library

[23]

CUDA reduction, "NVIDIA CUDA SDK - CUDA Parallel Reduction {online}. available: http://developer.nvidia.com/cuda-cc-sdk-code-samples#reduction," 2012, (accessed April 02, 2012).

[24]

J. S. Meredith, P. C. Roth, K. L. Spafford, and J. S. Vetter, "Performance implications of nonuniform device topologies in scalable heterogeneous architectures," IEEE Micro, vol. 31, no. 5, pp. 66--75, 2011. {Online}. Available: http://dx.doi.org/10.1109/MM.2011.79

Digital Library

[25]

K. Spafford, J. S. Meredith, S. Lee, D. Li, P. C. Roth, and J. S. Vetter, "The tradeoffs of fused memory hierarchies in heterogeneous architectures," in ACM Computing Frontiers (CF). Cagliari, Italy: ACM, 2012.

Digital Library

Cited By

Al-Mouhamed MKhan A(2017)SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPUThe Journal of Supercomputing10.1007/s11227-017-1972-373:9(3761-3795)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s11227-017-1972-3
Sourouri MBaden SCai X(2017)PandaInternational Journal of Parallel Programming10.1007/s10766-016-0454-145:3(711-729)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1007/s10766-016-0454-1
Lopez MLarrea VJoubert WHernandez OHaidar ATomov SDongarra JChandrasekaran SJuckeland G(2016)Towards achieving performance portability using directives for acceleratorsProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019122(13-24)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019120.3019122
Show More Cited By

Early evaluation of directive-based GPU programming models for productive exascale computing
1. Networks
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major ...
Evaluation of a Directive-Based GPU Programming Approach for High-Order Unstructured Mesh Computational Fluid Dynamics
PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference

In this work we evaluate the effectiveness of using OpenACC as a paradigm for the auto-parallelization of a high-order unstructured CFD code on Graphics Processing Units (GPUs). This is in lieu of hand-written CUDA or OpenCL code for the algorithms that ...
Evaluation of directive-based performance portable programming models

We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators. To do this, we use examples of algorithms with varying ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2012

1161 pages

ISBN:9781467308045

General Chair:
Jeffrey K. Hollingsworth
University of Maryland

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 10 November 2012

Check for updates

Qualifiers

Research-article

Conference

SC '12

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC '12: International Conference for High Performance Computing, Networking, Storage and Analysis

November 10 - 16, 2012

Utah, Salt Lake City

Acceptance Rates

SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
568
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Al-Mouhamed MKhan A(2017)SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPUThe Journal of Supercomputing10.1007/s11227-017-1972-373:9(3761-3795)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s11227-017-1972-3
Sourouri MBaden SCai X(2017)PandaInternational Journal of Parallel Programming10.1007/s10766-016-0454-145:3(711-729)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1007/s10766-016-0454-1
Lopez MLarrea VJoubert WHernandez OHaidar ATomov SDongarra JChandrasekaran SJuckeland G(2016)Towards achieving performance portability using directives for acceleratorsProceedings of the Third International Workshop on Accelerator Programming Using Directives10.5555/3019120.3019122(13-24)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3019120.3019122
Sultana NCalvert AOverbey JArnold GGaither K(2016)From OpenACC to OpenMP 4Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale10.1145/2949550.2949654(1-8)Online publication date: 17-Jul-2016
https://dl.acm.org/doi/10.1145/2949550.2949654
Kim JLee SVetter JNakashima HTaura KLange J(2016)IMPACCProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907302(189-201)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907302
Martineau MMcIntosh-Smith SBoulton MGaudin W(2016)An Evaluation of Emerging Many-Core Parallel Programming ModelsProceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/2883404.2883420(1-10)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2883404.2883420
Andión JArenaz MBodin FRodríguez GTouriño J(2016)Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP DirectivesInternational Journal of Parallel Programming10.1007/s10766-015-0362-944:3(620-643)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10766-015-0362-9
DeRose LGontarek AVose AMoench RAbramson DDinh MJin CKern JVetter J(2015)Relative debugging for a highly parallel hybrid computer systemProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807605(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2807591.2807605
Hijma PNieuwpoort RJacobs CBal H(2015)Stepwise-refinement for performanceConcurrency and Computation: Practice & Experience10.1002/cpe.341627:17(4515-4554)Online publication date: 10-Dec-2015
https://dl.acm.org/doi/10.1002/cpe.3416
Lee SVetter JChandrasekaran SFoertter FHernandez O(2014)OpenARCProceedings of the First Workshop on Accelerator Programming using Directives10.5555/2691158.2691159(1-11)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.5555/2691158.2691159
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents