skip to main content
10.1145/3447818.3460424acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

On the automatic parallelization of subscripted subscript patterns using array property analysis

Published: 04 June 2021 Publication History

Abstract

Parallelizing loops with subscripted subscript patterns at compile-time has long been a challenge for automatic parallelizers. In the class of irregular applications that we have analyzed, the presence of subscripted subscript patterns was one of the primary reasons why a significant number of loops could not be automatically parallelized. Loops with such patterns can be parallelized, if the subscript array or the expression in which the subscript array appears possess certain properties, such as monotonicity. The information required to prove the existence of these properties is often present in the application code itself. This suggests that their automatic detection may be feasible. In this paper, we present an algebra for representing and reasoning about subscript array properties, and we discuss a compile-time algorithm, based on symbolic range aggregation, that can prove monotonicity and parallelize key loops. We show that this algorithm can produce significant performance gains, not only in the parallelized loops, but also in the overall applications.

References

[1]
Zahira Ammarguellat and Williams Ludwell Harrison III. 1990. Automatic recognition of induction variables and recurrence relations by abstract interpretation. In Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation. 283--295.
[2]
Edward Anderson, Zhaojun Bai, Christian Bischof, L Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, et al. 1999. LAPACK Users' guide. SIAM.
[3]
David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS Parallel Benchmarks. The International Journal of Supercomputing Applications 5, 3 (1991), 63--73.
[4]
Utpal Banerjee, Rudolf Eigenmann, Alexandru Nicolau, and David Padua. 1993. Automatic Program Parallelization. Proc. IEEE 81, 2 (1993), 211--243. http://engineering.purdue.edu/paramnt/publications/BENP93.pdf
[5]
A. Bhosale and R. Eigenmann. 2020. Compile-time Parallelization of Subscripted Subscript Patterns. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 317--325.
[6]
L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, et al. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28, 2 (2002), 135--151.
[7]
William Blume and Rudolf Eigenmann. 1994. The Range Test: A Dependence Test for Symbolic, Non-linear Expressions. Proceedings of Supercomputing '94, Washington D.C. (Nov. 1994), 528--537.
[8]
William Blume and Rudolf Eigenmann. 1995. Symbolic Range Propagation. In the 9th International Parallel Processing Symposium. 357--363. citeseer.nj.nec.com/blume95symbolic.html
[9]
Yanqing Chen, Timothy A Davis, William W Hager, and Sivasankaran Rajamanickam. 2008. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software (TOMS) 35, 3 (2008), 1--14.
[10]
Francis Dang, Hao Yu, and Lawrence Rauchwerger. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings 16th International Parallel and Distributed Processing Symposium. IEEE, 10--pp.
[11]
Chirag Dave, Hansang Bae, Seung-Jai Min, Seyong Lee, Rudolf Eigenmann, and Samuel Midkiff. 2009. Cetus: A source-to-source compiler infrastructure for multicores. Computer 42, 12 (2009), 36--42.
[12]
Timothy A. Davis. 2006. Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics. https://epubs.siam.org/doi/pdf/10.1137/1.9780898718881
[13]
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1--25.
[14]
Eladio Gutiérrez, Rafael Asenjo, O Plata, and Emilio L. Zapata. 2000. Automatic parallelization of irregular applications. Parallel Comput. 26, 13-14 (2000), 1709--1738.
[15]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17. 0163-5964
[16]
Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.
[17]
Intel. 2011. Automatic Parallelization with Intel Compilers. Intel. https://software.intel.com/en-us/articles/automatic-parallelization-with-intel-compilers, visited 10-21-2019.
[18]
Philip A Knight, Daniel Ruiz, and Bora Uçar. 2014. A symmetry preserving algorithm for matrix scaling. SIAM journal on Matrix Analysis and Applications 35, 3 (2014), 931--955.
[19]
Yuan Lin and David Padua. 1999. Demand-driven interprocedural array property analysis. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 303--317.
[20]
Yuan Lin and David Padua. 2000a. Analysis of irregular single-indexed array accesses and its applications in compiler optimizations. In International Conference on Compiler Construction. Springer, 202--218.
[21]
Yuan Lin and David Padua. 2000b. Compiler Analysis of Irregular Memory Accesses. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (Vancouver, British Columbia, Canada) (PLDI '00). ACM, New York, NY, USA, 157--168.
[22]
Kathryn S. McKinley. June 1991. Dependence Analysis of Arrays Subscripted by Index Arrays. Technical Report. Rice University. TR91-162.
[23]
Mahdi Soltan Mohammadi, Kazem Cheshmi, Maryam Mehri Dehnavi, Anand Venkat, Tomofumi Yuki, and Michelle Mills Strout. 2018. Extending index-array properties for data dependence analysis. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 78--93.
[24]
Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C. Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout. 2019. Sparse Computation Data Dependence Simplification for Efficient Compiler-generated Inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). ACM, New York, NY, USA, 594--609.
[25]
Cosmin E Oancea and Lawrence Rauchwerger. 2011. A hybrid approach to proving memory reference monotonicity. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 61--75.
[26]
PGI. 2018. PGI Compiler User's Guide. Nvidia. https://www.pgroup.com/resources/docs/18.4/openpower/pgi-user-guide/index.htm
[27]
Dan Quinlan and Chunhua Liao. 2011. The rose source-to-source compiler infrastructure. In Cetus users and compiler infrastructure workshop, in conjunction with PACT, Vol. 2011. Citeseer, 1.
[28]
Lawrence Rauchwerger and David A Padua. 1999. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems 10, 2 (1999), 160--180.
[29]
Madalene Spezialetti and Rajiv Gupta. 1995. Loop monotonic statements. IEEE Transactions on Software Engineering 21, 6 (1995), 497--505.
[30]
Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the sparse polyhedral framework. Parallel Comput. 53 (2016), 32--57.
[31]
Peng Tu and David Padua. August 12-14, 1993. Automatic Array Privatization. In Proc. Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR. Lecture Notes in Computer Science., Utpal Banerjee, David Gelernter, Alex Nicolau, and David Padua (Eds.), Vol. 768. 500--521.
[32]
David Van Der Spoel, Erik Lindahl, Berk Hess, Gerrit Groenhof, Alan E Mark, and Herman JC Berendsen. 2005. GROMACS: fast, flexible, and free. Journal of computational chemistry 26, 16 (2005), 1701--1718.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing
June 2021
506 pages
ISBN:9781450383356
DOI:10.1145/3447818
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aggregation
  2. automatic parallelization
  3. monotonicity

Qualifiers

  • Research-article

Conference

ICS '21
Sponsor:

Acceptance Rates

ICS '21 Paper Acceptance Rate 39 of 157 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media