skip to main content
10.5555/2388996.2389028acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Early evaluation of directive-based GPU programming models for productive exascale computing

Published: 10 November 2012 Publication History

Abstract

Graphics Processing Unit (GPU)-based parallel computer architectures have shown increased popularity as a building block for high performance computing, and possibly for future Exascale computing. However, their programming complexity remains as a major hurdle for their widespread adoption. To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. These directive-based models provide different levels of abstraction, and required different levels of programming effort to port and optimize applications. Understanding these differences among these new models provides valuable insights on their applicability and performance potential. In this paper, we evaluate existing directive-based models by porting thirteen application kernels from various scientific domains to use CUDA GPUs, which, in turn, allows us to identify important issues in the functionality, scalability, tunability, and debuggability of the existing models. Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.

References

[1]
J. S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, "Keeneland: Bringing heterogeneous gpu computing to the computational science community," IEEE Computing in Science and Engineering, vol. 13, no. 5, pp. 90--95, 2011.
[2]
J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J.-. Andre, D. Barkai, J.-. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, X. Chi, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, B. Gropp, RobertHarrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Y. Ishikawa, Z. Jin, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, T. Lippert, B. Lucas, B. Maccabe, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. Mueller, W. Nagel, H. Nakashima, M. E. Papka, D. Reed, M. Sato, E. Seidel, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, S. Sumimoto, W. Tang, J. Taylor, R. Thakur, A. Trefethen, M. Valero, A. van der Steen, J. Vetter, P. Williams, R. Wisniewski, and K. Yelick, "The International Exascale Software Project RoadMap," Journal of High Performance Computer Applications, vol. 25, no. 1, 2011.
[3]
S. Amarasinghe, M. Hall, R. Lethin, K. Pingali, D. Quinlan, V. Sarkar, J. Shalf, R. Lucas, K. Yelick, P. Balaji, P. C. Diniz, A. Koniges, M. Snir, and S. R. Sachs, "Report of the 2011 workshop on exascale programming challenges," US Department of Energy, Tech. Rep., 2011.
[4]
Sh, "Sh: A metaprogramming language for programmable GPUs. {online}. available: http://www.libsh.org," (accessed April 02, 2012).
[5]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," in SIGGRAPH '04: ACM SIGGRAPH 2004 Papers. New York, NY, USA: ACM, 2004, pp. 777--786.
[6]
S. wei Liao, Z. Du, G. Wu, and G.-Y. Lueh, "Data and computation transformations for brook streaming applications on multiprocessors," in CGO '06: Proceedings of the International Symposium on Code Generation and Optimization. Washington, DC, USA: IEEE Computer Society, 2006, pp. 196--207.
[7]
M. Peercy, M. Segal, and D. Gerstmann, "A performance-oriented data parallel virtual machine for GPUs," in SIGGRAPH '06: ACM SIGGRAPH 2006 Sketches. New York, NY, USA: ACM, 2006, p. 184.
[8]
CUDA, "NVIDIA CUDA {online}. available: http://developer.nvidia.com/category/zone/cuda-zone," 2012, (accessed April 02, 2012).
[9]
OpenCL, "OpenCL {Online}. Available: http://www.khronos.org/opencl/," 2012, (accessed April 02, 2012).
[10]
OpenMP, "OpenMP {Online}. Available: http://openmp.org/wp/," 2012, (accessed April 02, 2012).
[11]
T. D. Han and T. S. Abdelrahman, "hicuda: High-level gpgpu programming," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 1, pp. 78--90, 2011.
[12]
S. Lee and R. Eigenmann, "OpenMPC: Extended OpenMP programming and tuning for GPUs," in SC'10: Proceedings of the 2010 ACM/IEEE conference on Supercomputing. IEEE press, 2010.
[13]
PGI_Accelerator, "The Portland Group, PGI Fortran and C Accelarator Programming Model {Online}. Available: http://www.pgroup.com/resources/accel.htm," 2009, (accessed April 02, 2012).
[14]
HMPP, "HMPP Workbench, a directive-based compiler for hybrid computing {Online}. Available: www.caps-entreprise.com/hmpp.html," 2009, (accessed April 02, 2012).
[15]
A. Leung, N. Vasilache, B. Meister, M. Baskaran, D. Wohlford, C. Bastoul, and R. Lethin, "A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction," in Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, ser. GPGPU '10. New York, NY, USA: ACM, 2010, pp. 51--61.
[16]
OpenACC, "OpenACC: Directives for Accelerators {Online}. Available: http://www.openacc-standard.org," 2011, (accessed April 02, 2012).
[17]
J. C. Beyer, E. J. Stotzer, A. Hart, and B. R. de Supinski, "OpenMP for Accelerators." in IWOMP'11, 2011, pp. 108--121.
[18]
O. Hernandez, W. Ding, B. Chapman, C. Kartsaklis, R. Sankaran, and R. Graham, "Experiences with High-Level Programming Directives for Porting Applications to GPUs," in Facing the Multicore - Challenge II. Springer Berlin Heidelberg, 2012, pp. 96--107.
[19]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. ha Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2009.
[20]
L. L. Pilla, "Hpcgpu Project {Online}. Available: http://hpcgpu.codeplex.com/," 2012, (accessed April 02, 2012).
[21]
S. Lee, S.-J. Min, and R. Eigenmann, "OpenMP to GPGPU: A compiler framework for automatic translation and optimization," in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). New York, NY, USA: ACM, Feb. 2009, pp. 101--110.
[22]
L. Luo, M. Wong, and W.-m. Hwu, "An effective GPU implementation of breadth-first search," in Proceedings of the 47th Design Automation Conference, ser. DAC '10. New York, NY, USA: ACM, 2010, pp. 52--55.
[23]
CUDA reduction, "NVIDIA CUDA SDK - CUDA Parallel Reduction {online}. available: http://developer.nvidia.com/cuda-cc-sdk-code-samples#reduction," 2012, (accessed April 02, 2012).
[24]
J. S. Meredith, P. C. Roth, K. L. Spafford, and J. S. Vetter, "Performance implications of nonuniform device topologies in scalable heterogeneous architectures," IEEE Micro, vol. 31, no. 5, pp. 66--75, 2011. {Online}. Available: http://dx.doi.org/10.1109/MM.2011.79
[25]
K. Spafford, J. S. Meredith, S. Lee, D. Li, P. C. Roth, and J. S. Vetter, "The tradeoffs of fused memory hierarchies in heterogeneous architectures," in ACM Computing Frontiers (CF). Cagliari, Italy: ACM, 2012.

Cited By

View all
  1. Early evaluation of directive-based GPU programming models for productive exascale computing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
      November 2012
      1161 pages
      ISBN:9781467308045

      Sponsors

      Publisher

      IEEE Computer Society Press

      Washington, DC, United States

      Publication History

      Published: 10 November 2012

      Check for updates

      Qualifiers

      • Research-article

      Conference

      SC '12
      Sponsor:

      Acceptance Rates

      SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media