research-article

LLVM-based communication optimizations for PGAS programs

Authors:

Akihiro Hayashi,

Michael Ferguson,

Vivek SarkarAuthors Info & Claims

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

Article No.: 1, Pages 1 - 11

https://doi.org/10.1145/2833157.2833164

Published: 15 November 2015 Publication History

Abstract

While Partitioned Global Address Space (PGAS) programming languages such as UPC/UPC++, CAF, Chapel and X10 provide high-level programming models for facilitating large-scale distributed-memory parallel programming, it is widely recognized that compiler analysis and optimization for these languages has been very limited, unlike the optimization of SMP models such as OpenMP. One reason for this limitation is that current optimizers for PGAS programs are specialized to different languages. This is unfortunate since communication optimization is an important class of compiler optimizations for PGAS programs running on distributed-memory platforms, and these optimizations need to be performed more widely. Thus, a more effective approach would be to build a language-independent and runtime-independent compiler framework for optimizing PGAS programs so that new communication optimizations can be leveraged by different languages.

To address this need, we introduce an LLVM-based (Low Level Virtual Machine) communication optimization framework. Our compilation system leverages existing optimization passes and introduces new PGAS language-aware runtime dependent/independent passes to reduce communication overheads. Our experimental results show an average performance improvement of 3.5× and 3.4× on 64-nodes of a Cray XC30^TM supercomputer and 32-nodes of a Westmere cluster respectively, for a set of benchmarks written in the Chapel language. Overall, we show that our new LLVM-based compiler optimization framework can effectively improve the performance of PGAS programs.

References

[1]

S. Agarwal, R. Barik, V. Sarkar, and R. K. Shyamasundar. May-happen-in-parallel analysis of X10 programs. PPoPP '07, pages 183--193, New York, NY, USA, 2007. ACM.

Digital Library

[2]

M. Alvanos, M. Farreras, E. Tiotto, J. N. Amaral, and X. Martorell. Improving communication in PGAS environments: Static and dynamic coalescing in UPC. ICS '13, pages 129--138, New York, NY, USA, 2013. ACM.

Digital Library

[3]

D. A. Bader and K. Madduri. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. HiPC'05, pages 465--476, Berlin, Heidelberg, 2005. Springer-Verlag.

Digital Library

[4]

R. Barik, J. Zhao, D. Grove, I. Peshansky, Z. Budimlic, and V. Sarkar. Communication optimizations for distributed-memory X10 programs. In IPDPS'11, pages 1101--1113, 2011.

Digital Library

[5]

C. Barton, C. Casçaval, G. Almási, Y. Zheng, M. Farreras, S. Chatterje, and J. N. Amaral. Shared memory programming for large scale machines. PLDI'06, pages 108--117, New York, NY, USA, 2006. ACM.

Digital Library

[6]

C. M. Barton. Improving Access to Shared Data in a Partitioned Global Address Space Programming Model. 2009. Ph.D. Thesis.

[7]

C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In IPDPS'06, pages 10 pp.--, April 2006.

Digital Library

[8]

B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and D. Weathersby. Factor-join: A unique approach to compiling array languages for parallel machines. LCPC '96, pages 481--500, London, UK, UK, 1997. Springer-Verlag.

Digital Library

[9]

Chapel. The Chapel language specification version 0.98. http://chapel.cray.com/spec/spec-0.98.pdf, Oct. 2015.

[10]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. OOPSLA'05, pages 519--538, New York, NY, USA, 2005. ACM.

Digital Library

[11]

S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. IPDPS '13, pages 712--725, Washington, DC, USA, 2013. IEEE Computer Society.

Digital Library

[12]

D. Chavarría-Miranda and J. Mellor-Crummey. Effective communication coalescing for data-parallel applications. PPoPP '05, pages 14--25, New York, NY, USA, 2005. ACM.

Digital Library

[13]

T. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specification v1.1.1, October 2003.

[14]

S. C. et al. Type inference for locality analysis of distributed data structures. In PPoPP '08, pages 11--22, New York, NY, USA, 2008. ACM.

Digital Library

[15]

W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, 1994.

Digital Library

[16]

P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium Language Reference Manual. Technical Report CSD-01-1163, University of California at Berkeley, Berkeley, Ca, USA, 2001.

Digital Library

[17]

S. Hiranandani, K. Kennedy, and C.-W. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines. ICS '92, pages 1--14, New York, NY, USA, 1992. ACM.

Digital Library

[18]

Intel. Intel SPMD program compiler. https://ispc.github.io/.

[19]

K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. Compiling and Optimizing Java 8 Programs for GPGPU Execution. PACT '15, 2015.

[20]

S. Kumar, A. Mamidala, D. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burrow. Pami: A parallel active message interface for the Blue Gene/Q supercomputer. In IPDPS '12, pages 763--773, May 2012.

Digital Library

[21]

C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. CGO '04, pages 75--, Washington, DC, USA, 2004. IEEE Computer Society.

Digital Library

[22]

LLVM. LLVM language reference manual. http://llvm.org/docs/LangRef.html.

[23]

N. Namashivayam, S. Ghosh, D. Khaldi, D. Eachempati, and B. Chapman. Native mode-based optimizations of remote memory accesses in OpenSHMEM for intel xeon phi. PGAS '14, pages 12:1--12:11, New York, NY, USA, 2014. ACM.

Digital Library

[24]

R. W. Numrich and J. Reid. Co-Array Fortran for parallel programming. ACM SIGPLAN Fortran Forum Archive, 17:1--31, Aug. 1998.

Digital Library

[25]

NVIDIA. CUDA LLVM compiler. https://developer.nvidia.com/cuda-llvm-compiler.

[26]

NVIDIA. NVVM IR specification 1.1. http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html.

[27]

S. Poole, O. Hernandez, J. Kuehn, G. Shipman, A. Curtis, and K. Feind. Openshmem - toward a unified RMA model. In D. Padua, editor, Encyclopedia of Parallel Computing, pages 1379--1391. Springer US, 2011.

[28]

V. Sarkar, W. Harrod, and A. Snavely. Software Challenges in Extreme Scale Systems. Special Issue on Advanced Computing: The Roadmap to Exascale., 2010.

[29]

A. Skjellum, E. Lusk, and W. Gropp. Using MPI: Portable Parallel Programming with the Message Passing Iinterface. MIT Press, 1999.

Digital Library

[30]

sourceforge.net. Chapel repository. http://sourceforge.net/p/chapel/code/HEAD/tree/trunk/test/.

[31]

K. Wheeler, R. Murphy, and D. Thain. Qthreads: An API for programming with millions of lightweight threads. In IPDPS '08, pages 1--8, April 2008.

[32]

J. Zhao and V. Sarkar. Intermediate Language Extensions for Parallelism. VMIL'11, October 2011.

Digital Library

[33]

Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. Yelick. Upc++: A PGAS extension for C++. IPDPS '14, pages 1105--1114, Washington, DC, USA, 2014. IEEE Computer Society.

Digital Library

Cited By

Paul SHayashi AChen KElmougy YSarkar V(2023)A Fine-grained Asynchronous Bulk Synchronous parallelism model for PGAS applicationsJournal of Computational Science10.1016/j.jocs.2023.10201469(102014)Online publication date: May-2023
https://doi.org/10.1016/j.jocs.2023.102014
Kayraklioglu ERonaghan EFerguson MChamberlain B(2022)Locality-Based Optimizations in the Chapel CompilerLanguages and Compilers for Parallel Computing10.1007/978-3-030-99372-6_1(3-17)Online publication date: 24-Mar-2022
https://doi.org/10.1007/978-3-030-99372-6_1
Dewan GJenkins L(2020)Paving the way for Distributed Non-Blocking Algorithms and Data Structures in the Partitioned Global Address Space model2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00111(659-666)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00111
Show More Cited By

Index Terms

LLVM-based communication optimizations for PGAS programs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications
LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

We extend the LLVM intermediate representation (IR) to make it a parallel IR (LLVM PIR), which is a necessary step for introducing simple and generic parallel code optimization into LLVM. LLVM is a modular compiler that can be efficiently and easily ...
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
ESPM2'17: Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware

With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory ...
Communication Optimizations for Parallel C Programs
Special issue on compilation and architectural support for parallel applications

This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC

November 2015

74 pages

ISBN:9781450340052

DOI:10.1145/2833157

Conference Chair:
Hal Finkel
Argonne National Laboratory

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS\DATC

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2015

Texas, Austin

Acceptance Rates

LLVM '15 Paper Acceptance Rate 7 of 12 submissions, 58%;

Overall Acceptance Rate 16 of 22 submissions, 73%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
206
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Paul SHayashi AChen KElmougy YSarkar V(2023)A Fine-grained Asynchronous Bulk Synchronous parallelism model for PGAS applicationsJournal of Computational Science10.1016/j.jocs.2023.10201469(102014)Online publication date: May-2023
https://doi.org/10.1016/j.jocs.2023.102014
Kayraklioglu ERonaghan EFerguson MChamberlain B(2022)Locality-Based Optimizations in the Chapel CompilerLanguages and Compilers for Parallel Computing10.1007/978-3-030-99372-6_1(3-17)Online publication date: 24-Mar-2022
https://doi.org/10.1007/978-3-030-99372-6_1
Dewan GJenkins L(2020)Paving the way for Distributed Non-Blocking Algorithms and Data Structures in the Partitioned Global Address Space model2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00111(659-666)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00111
Kayraklioglu EFerguson MEl-Ghazawi T(2018)LAPPSACM Transactions on Architecture and Code Optimization10.1145/323329915:3(1-26)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3233299
Saillard ESen KLavrijsen WIancu C(2018)Maximizing Communication Overlap with Dynamic Program AnalysisProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149459(1-11)Online publication date: 28-Jan-2018
https://dl.acm.org/doi/10.1145/3149457.3149459
Stelle GMoses WOlivier SMcCormick P(2017)OpenMPIRProceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC10.1145/3148173.3148186(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3148173.3148186

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents