skip to main content
10.1145/2833157.2833164acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

LLVM-based communication optimizations for PGAS programs

Published: 15 November 2015 Publication History

Abstract

While Partitioned Global Address Space (PGAS) programming languages such as UPC/UPC++, CAF, Chapel and X10 provide high-level programming models for facilitating large-scale distributed-memory parallel programming, it is widely recognized that compiler analysis and optimization for these languages has been very limited, unlike the optimization of SMP models such as OpenMP. One reason for this limitation is that current optimizers for PGAS programs are specialized to different languages. This is unfortunate since communication optimization is an important class of compiler optimizations for PGAS programs running on distributed-memory platforms, and these optimizations need to be performed more widely. Thus, a more effective approach would be to build a language-independent and runtime-independent compiler framework for optimizing PGAS programs so that new communication optimizations can be leveraged by different languages.
To address this need, we introduce an LLVM-based (Low Level Virtual Machine) communication optimization framework. Our compilation system leverages existing optimization passes and introduces new PGAS language-aware runtime dependent/independent passes to reduce communication overheads. Our experimental results show an average performance improvement of 3.5× and 3.4× on 64-nodes of a Cray XC30TM supercomputer and 32-nodes of a Westmere cluster respectively, for a set of benchmarks written in the Chapel language. Overall, we show that our new LLVM-based compiler optimization framework can effectively improve the performance of PGAS programs.

References

[1]
S. Agarwal, R. Barik, V. Sarkar, and R. K. Shyamasundar. May-happen-in-parallel analysis of X10 programs. PPoPP '07, pages 183--193, New York, NY, USA, 2007. ACM.
[2]
M. Alvanos, M. Farreras, E. Tiotto, J. N. Amaral, and X. Martorell. Improving communication in PGAS environments: Static and dynamic coalescing in UPC. ICS '13, pages 129--138, New York, NY, USA, 2013. ACM.
[3]
D. A. Bader and K. Madduri. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. HiPC'05, pages 465--476, Berlin, Heidelberg, 2005. Springer-Verlag.
[4]
R. Barik, J. Zhao, D. Grove, I. Peshansky, Z. Budimlic, and V. Sarkar. Communication optimizations for distributed-memory X10 programs. In IPDPS'11, pages 1101--1113, 2011.
[5]
C. Barton, C. Casçaval, G. Almási, Y. Zheng, M. Farreras, S. Chatterje, and J. N. Amaral. Shared memory programming for large scale machines. PLDI'06, pages 108--117, New York, NY, USA, 2006. ACM.
[6]
C. M. Barton. Improving Access to Shared Data in a Partitioned Global Address Space Programming Model. 2009. Ph.D. Thesis.
[7]
C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing bandwidth limited problems using one-sided communication and overlap. In IPDPS'06, pages 10 pp.--, April 2006.
[8]
B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and D. Weathersby. Factor-join: A unique approach to compiling array languages for parallel machines. LCPC '96, pages 481--500, London, UK, UK, 1997. Springer-Verlag.
[9]
Chapel. The Chapel language specification version 0.98. http://chapel.cray.com/spec/spec-0.98.pdf, Oct. 2015.
[10]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. OOPSLA'05, pages 519--538, New York, NY, USA, 2005. ACM.
[11]
S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with MPI. IPDPS '13, pages 712--725, Washington, DC, USA, 2013. IEEE Computer Society.
[12]
D. Chavarría-Miranda and J. Mellor-Crummey. Effective communication coalescing for data-parallel applications. PPoPP '05, pages 14--25, New York, NY, USA, 2005. ACM.
[13]
T. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specification v1.1.1, October 2003.
[14]
S. C. et al. Type inference for locality analysis of distributed data structures. In PPoPP '08, pages 11--22, New York, NY, USA, 2008. ACM.
[15]
W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, 1994.
[16]
P. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium Language Reference Manual. Technical Report CSD-01-1163, University of California at Berkeley, Berkeley, Ca, USA, 2001.
[17]
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines. ICS '92, pages 1--14, New York, NY, USA, 1992. ACM.
[18]
Intel. Intel SPMD program compiler. https://ispc.github.io/.
[19]
K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. Compiling and Optimizing Java 8 Programs for GPGPU Execution. PACT '15, 2015.
[20]
S. Kumar, A. Mamidala, D. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burrow. Pami: A parallel active message interface for the Blue Gene/Q supercomputer. In IPDPS '12, pages 763--773, May 2012.
[21]
C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. CGO '04, pages 75--, Washington, DC, USA, 2004. IEEE Computer Society.
[22]
LLVM. LLVM language reference manual. http://llvm.org/docs/LangRef.html.
[23]
N. Namashivayam, S. Ghosh, D. Khaldi, D. Eachempati, and B. Chapman. Native mode-based optimizations of remote memory accesses in OpenSHMEM for intel xeon phi. PGAS '14, pages 12:1--12:11, New York, NY, USA, 2014. ACM.
[24]
R. W. Numrich and J. Reid. Co-Array Fortran for parallel programming. ACM SIGPLAN Fortran Forum Archive, 17:1--31, Aug. 1998.
[25]
NVIDIA. CUDA LLVM compiler. https://developer.nvidia.com/cuda-llvm-compiler.
[26]
NVIDIA. NVVM IR specification 1.1. http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html.
[27]
S. Poole, O. Hernandez, J. Kuehn, G. Shipman, A. Curtis, and K. Feind. Openshmem - toward a unified RMA model. In D. Padua, editor, Encyclopedia of Parallel Computing, pages 1379--1391. Springer US, 2011.
[28]
V. Sarkar, W. Harrod, and A. Snavely. Software Challenges in Extreme Scale Systems. Special Issue on Advanced Computing: The Roadmap to Exascale., 2010.
[29]
A. Skjellum, E. Lusk, and W. Gropp. Using MPI: Portable Parallel Programming with the Message Passing Iinterface. MIT Press, 1999.
[30]
sourceforge.net. Chapel repository. http://sourceforge.net/p/chapel/code/HEAD/tree/trunk/test/.
[31]
K. Wheeler, R. Murphy, and D. Thain. Qthreads: An API for programming with millions of lightweight threads. In IPDPS '08, pages 1--8, April 2008.
[32]
J. Zhao and V. Sarkar. Intermediate Language Extensions for Parallelism. VMIL'11, October 2011.
[33]
Y. Zheng, A. Kamil, M. B. Driscoll, H. Shan, and K. Yelick. Upc++: A PGAS extension for C++. IPDPS '14, pages 1105--1114, Washington, DC, USA, 2014. IEEE Computer Society.

Cited By

View all

Index Terms

  1. LLVM-based communication optimizations for PGAS programs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC
    November 2015
    74 pages
    ISBN:9781450340052
    DOI:10.1145/2833157
    • Conference Chair:
    • Hal Finkel
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chapel
    2. LLVM
    3. PGAS languages
    4. communication optimizations

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    SC15
    Sponsor:

    Acceptance Rates

    LLVM '15 Paper Acceptance Rate 7 of 12 submissions, 58%;
    Overall Acceptance Rate 16 of 22 submissions, 73%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 26 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media