research-article

On the automatic parallelization of subscripted subscript patterns using array property analysis

Authors:

Akshay Bhosale,

Rudolf EigenmannAuthors Info & Claims

ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

Pages 392 - 403

https://doi.org/10.1145/3447818.3460424

Published: 04 June 2021 Publication History

Abstract

Parallelizing loops with subscripted subscript patterns at compile-time has long been a challenge for automatic parallelizers. In the class of irregular applications that we have analyzed, the presence of subscripted subscript patterns was one of the primary reasons why a significant number of loops could not be automatically parallelized. Loops with such patterns can be parallelized, if the subscript array or the expression in which the subscript array appears possess certain properties, such as monotonicity. The information required to prove the existence of these properties is often present in the application code itself. This suggests that their automatic detection may be feasible. In this paper, we present an algebra for representing and reasoning about subscript array properties, and we discuss a compile-time algorithm, based on symbolic range aggregation, that can prove monotonicity and parallelize key loops. We show that this algorithm can produce significant performance gains, not only in the parallelized loops, but also in the overall applications.

References

[1]

Zahira Ammarguellat and Williams Ludwell Harrison III. 1990. Automatic recognition of induction variables and recurrence relations by abstract interpretation. In Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation. 283--295.

Digital Library

[2]

Edward Anderson, Zhaojun Bai, Christian Bischof, L Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, et al. 1999. LAPACK Users' guide. SIAM.

[3]

David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS Parallel Benchmarks. The International Journal of Supercomputing Applications 5, 3 (1991), 63--73.

Digital Library

[4]

Utpal Banerjee, Rudolf Eigenmann, Alexandru Nicolau, and David Padua. 1993. Automatic Program Parallelization. Proc. IEEE 81, 2 (1993), 211--243. http://engineering.purdue.edu/paramnt/publications/BENP93.pdf

[5]

A. Bhosale and R. Eigenmann. 2020. Compile-time Parallelization of Subscripted Subscript Patterns. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 317--325.

[6]

L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, et al. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28, 2 (2002), 135--151.

Digital Library

[7]

William Blume and Rudolf Eigenmann. 1994. The Range Test: A Dependence Test for Symbolic, Non-linear Expressions. Proceedings of Supercomputing '94, Washington D.C. (Nov. 1994), 528--537.

Digital Library

[8]

William Blume and Rudolf Eigenmann. 1995. Symbolic Range Propagation. In the 9th International Parallel Processing Symposium. 357--363. citeseer.nj.nec.com/blume95symbolic.html

[9]

Yanqing Chen, Timothy A Davis, William W Hager, and Sivasankaran Rajamanickam. 2008. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software (TOMS) 35, 3 (2008), 1--14.

Digital Library

[10]

Francis Dang, Hao Yu, and Lawrence Rauchwerger. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings 16th International Parallel and Distributed Processing Symposium. IEEE, 10--pp.

[11]

Chirag Dave, Hansang Bae, Seung-Jai Min, Seyong Lee, Rudolf Eigenmann, and Samuel Midkiff. 2009. Cetus: A source-to-source compiler infrastructure for multicores. Computer 42, 12 (2009), 36--42.

Digital Library

[12]

Timothy A. Davis. 2006. Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics. https://epubs.siam.org/doi/pdf/10.1137/1.9780898718881

[13]

Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1--25.

Digital Library

[14]

Eladio Gutiérrez, Rafael Asenjo, O Plata, and Emilio L. Zapata. 2000. Automatic parallelization of irregular applications. Parallel Comput. 26, 13-14 (2000), 1709--1738.

Digital Library

[15]

John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17. 0163-5964

Digital Library

[16]

Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.

[17]

Intel. 2011. Automatic Parallelization with Intel Compilers. Intel. https://software.intel.com/en-us/articles/automatic-parallelization-with-intel-compilers, visited 10-21-2019.

[18]

Philip A Knight, Daniel Ruiz, and Bora Uçar. 2014. A symmetry preserving algorithm for matrix scaling. SIAM journal on Matrix Analysis and Applications 35, 3 (2014), 931--955.

Digital Library

[19]

Yuan Lin and David Padua. 1999. Demand-driven interprocedural array property analysis. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 303--317.

[20]

Yuan Lin and David Padua. 2000a. Analysis of irregular single-indexed array accesses and its applications in compiler optimizations. In International Conference on Compiler Construction. Springer, 202--218.

[21]

Yuan Lin and David Padua. 2000b. Compiler Analysis of Irregular Memory Accesses. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (Vancouver, British Columbia, Canada) (PLDI '00). ACM, New York, NY, USA, 157--168.

Digital Library

[22]

Kathryn S. McKinley. June 1991. Dependence Analysis of Arrays Subscripted by Index Arrays. Technical Report. Rice University. TR91-162.

[23]

Mahdi Soltan Mohammadi, Kazem Cheshmi, Maryam Mehri Dehnavi, Anand Venkat, Tomofumi Yuki, and Michelle Mills Strout. 2018. Extending index-array properties for data dependence analysis. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 78--93.

[24]

Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C. Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout. 2019. Sparse Computation Data Dependence Simplification for Efficient Compiler-generated Inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). ACM, New York, NY, USA, 594--609.

Digital Library

[25]

Cosmin E Oancea and Lawrence Rauchwerger. 2011. A hybrid approach to proving memory reference monotonicity. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 61--75.

[26]

PGI. 2018. PGI Compiler User's Guide. Nvidia. https://www.pgroup.com/resources/docs/18.4/openpower/pgi-user-guide/index.htm

[27]

Dan Quinlan and Chunhua Liao. 2011. The rose source-to-source compiler infrastructure. In Cetus users and compiler infrastructure workshop, in conjunction with PACT, Vol. 2011. Citeseer, 1.

[28]

Lawrence Rauchwerger and David A Padua. 1999. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems 10, 2 (1999), 160--180.

Digital Library

[29]

Madalene Spezialetti and Rajiv Gupta. 1995. Loop monotonic statements. IEEE Transactions on Software Engineering 21, 6 (1995), 497--505.

Digital Library

[30]

Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the sparse polyhedral framework. Parallel Comput. 53 (2016), 32--57.

Digital Library

[31]

Peng Tu and David Padua. August 12-14, 1993. Automatic Array Privatization. In Proc. Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR. Lecture Notes in Computer Science., Utpal Banerjee, David Gelernter, Alex Nicolau, and David Padua (Eds.), Vol. 768. 500--521.

[32]

David Van Der Spoel, Erik Lindahl, Berk Hess, Gerrit Groenhof, Alan E Mark, and Herman JC Berendsen. 2005. GROMACS: fast, flexible, and free. Journal of computational chemistry 26, 16 (2005), 1701--1718.

Cited By

Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Bhosale ABarakhshan PRosas MEigenmann R(2022)Automatic and Interactive Program Parallelization Using the Cetus Source to Source Compiler Infrastructure v2.0Electronics10.3390/electronics1105080911:5(809)Online publication date: 4-Mar-2022
https://doi.org/10.3390/electronics11050809
Khan SChatterjee BPande SRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)VICOProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532393(1-14)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532393
Show More Cited By

Index Terms

On the automatic parallelization of subscripted subscript patterns using array property analysis
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Shared memory algorithms
  2. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
      1. Algebraic algorithms

Recommendations

Recurrence Analysis for Automatic Parallelization of Subscripted Subscripts
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Introducing correct and optimal OpenMP parallelization directives in applications is a challenge. To parallelize a loop in an input application code automatically, parallelizing compilers need to disprove dependences with respect to variables across ...
Discovering Maximum Parallelization Using Advanced Data Dependence Analysis
HPCC '08: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications

Accurate data dependence testing allows a compiler to perform safe automatic code optimization and parallelization. It has been shown that factors, such as loop variants and nonlinear expressions, limit opportunity for data dependence testing and ...
Run-Time Parallelization and Scheduling of Loops

The authors study run-time methods to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

June 2021

506 pages

ISBN:9781450383356

DOI:10.1145/3447818

General Chairs:
Huiyang Zhou
North Carolina State University
,
Jose Moreira
IBM Research
,
Program Chairs:
Frank Mueller
North Carolina State University
,
Yoav Etsion
Technion

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '21

Sponsor:

SIGARCH

ICS '21: 2021 International Conference on Supercomputing

June 14 - 17, 2021

Virtual Event, USA

Acceptance Rates

ICS '21 Paper Acceptance Rate 39 of 157 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
167
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Bhosale ABarakhshan PRosas MEigenmann R(2022)Automatic and Interactive Program Parallelization Using the Cetus Source to Source Compiler Infrastructure v2.0Electronics10.3390/electronics1105080911:5(809)Online publication date: 4-Mar-2022
https://doi.org/10.3390/electronics11050809
Khan SChatterjee BPande SRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)VICOProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532393(1-14)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532393
Barakhshan PEigenmann R(2022)Learning from Automatically Versus Manually Parallelized NAS BenchmarksLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_3(29-46)Online publication date: 12-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-31445-2_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents