tutorial

Non-affine Extensions to Polyhedral Code Generation

Authors:

Manu Shantharam,

Michelle Mills StroutAuthors Info & Claims

CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Pages 185 - 194

https://doi.org/10.1145/2544137.2544141

Published: 16 October 2018 Publication History

Abstract

This paper describes a loop transformation framework that extends a polyhedral representation of loop nests to represent and transform computations with non-affine index arrays in loop bounds and subscripts via a new interface between compile-time and run-time abstractions. Polyhedra scanning code generation, which historically applies an affine mapping to the subscript expressions of the statements in a loop nest, is modified to apply non-affine mappings involving index arrays that are represented at compile time by uninterpreted functions; non-affine loop bounds involving index arrays are also represented. When appropriate, an inspector is utilized to capture the non-affine subscript mappings, and a generalized loop coalescing transformation is introduced as a non-affine transformation to support non-affine loop bounds. With this support, complex sequences of new and existing transformations can then be composed. We demonstrate the effectiveness of this framework by optimizing sparse matrix vector multiplication operations targeting GPUs for different matrix structures and parallelization strategies. This approach achieves performance that is comparable to or greater than the hand-tuned CUSP library; for two of the implementations it achieves an average 1.14× improvement over CUSP across a collection of sparse matrices, while the third performs on average within 8% of CUSP.

References

[1]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2002.

Digital Library

[2]

C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 1991.

Digital Library

[3]

D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. Journal of Parallel and Distributed Computing, 40(2):210--226, 1997.

Digital Library

[4]

A. Basumallik and R. Eigenmann. Optimizing irregular shared-memory applications for distributed-memory systems. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, 2006.

Digital Library

[5]

N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of SC '09, Nov. 2009.

Digital Library

[6]

M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In Proceedings of the International Conference on Compiler Construction (ETAPS CC'10), LNCS, Paphos, Cyprus, Mar. 2010. Springer-Verlag.

Digital Library

[7]

W. Blume and R. Eigenmann. The range test: a dependence test for symbolic, non-linear expressions. In Proceedings of Supercomputing '94, 1994.

Digital Library

[8]

C. Chen. Polyhedra scanning revisited. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 499--508, June 2012.

Digital Library

[9]

T. Davis. The University of Florida Sparse Matrix Collection. NA Digest, 97, 1997.

[10]

C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229--241, New York, NY, USA, May 1999. ACM.

Digital Library

[11]

P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79--103, 1996.

[12]

M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural parallelization analysis in suif. ACM Trans. Program. Lang. Syst., 27(4):662--731, July 2005.

Digital Library

[13]

H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17(7):606--618, 2006.

Digital Library

[14]

W. A. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, University of Maryland, Dec. 1996.

Digital Library

[15]

M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. A script-based autotuning compiler system to generate high-performance cuda code. ACM Trans. Archit. Code Optim., 9(4):31:1--31:25, Jan. 2013.

Digital Library

[16]

Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, May 2000.

Digital Library

[17]

J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming, 29(3):217--247, 2001.

Digital Library

[18]

R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the 2nd International Conference on Supercomputing, pages 140--152, 1988.

Digital Library

[19]

N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192--202, October 1999.

Digital Library

[20]

B. Pottenger and R. Eigenmann. Idiom recognition in the polaris parallelizing compiler. In Proceedings of SC'95, Nov. 1995.

Digital Library

[21]

W. Pugh and D. Wonnacott. Nonlinear array dependence analysis. In Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, May 1995.

[22]

W. Pugh and D. Wonnacott. Constraint-based array dependence analysis. ACM Transactions on Programming Languages and Systems, 20(3):635--678, 1 May 1998.

Digital Library

[23]

F. Quilleré and S. Rajopadhye. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28(5):469--498, Oct. 2000.

Digital Library

[24]

L. Rauchwerger and D. Padua. The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, PLDI '95, 1995.

Digital Library

[25]

M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of SC'12, November 2012.

Digital Library

[26]

S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid analysis: static & dynamic memory reference analysis. International Journal Parallel Programming, 31(4):251--283, 2003.

Digital Library

[27]

J. Saltz, C. Chang, G. Edjlali, Y.-S. Hwang, B. Moon, R. Ponnusamy, S. Sharma, A. Sussman, M. Uysal, G. Agrawal, R. Das, and P. Havlak. Programming irregular applications: Runtime support, compilation and tools. Advances in Computers, 45:105--153, 1997.

[28]

M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2003.

Digital Library

[29]

M. M. Strout, G. George, and C. Olschanowsky. Set and relation manipulation for the sparse polyhedral framework. In Proceedings of the 25th International Workshop on Languages and Compilers for Parallel Computing (LCPC), September 2012.

[30]

M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An approach for code generation in the sparse polyhedral framework. Technical Report CS-13-109, Colorado State University, December 2013.

[31]

H. van der Spek and H. Wijshoff. Sublimation: Expanding data structures to enable data instance specific optimizations. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, pages 106--120. Springer Berlin / Heidelberg, 2010.

Digital Library

[32]

N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proceedings of the 15th International Conference on Compiler Construction, Mar. 2006.

Digital Library

[33]

S. Verdoolaege. isl: An integer set library for the polyhedral model. In K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama, editors, Lecture Notes in Computer Science, pages 299--302. Springer, Sept. 2010.

Digital Library

[34]

S. Verdoolaege, J. Carlos Juega, A. Cohen, J. Ignacio Gómez, C. Tenllado, and F. Catthoor. Polyhedral Parallel Code Generation for CUDA. ACM Trans. Archit. Code Optim., 9(4):54:1--54:23, Jan. 2013.

Digital Library

[35]

R. Vuduc, J. W. Demmel, and K. A. Yelick. Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521--530, 2005.

[36]

S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009.

Digital Library

[37]

M. Wolfe. Optimizing Supercompilers for Supercomputers. The MIT Press, 1989.

Digital Library

[38]

B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of pa ral lel programming, PPoPP '13, 2013.

Digital Library

Cited By

Laird ALiu BBjørner NDehnavi M(2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656445
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Yesil SHeidarshenas AMorrison ATorrellas JDehnavi MKulkarni MKrishnamoorthy S(2023)WISEProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577506(329-341)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577506
Show More Cited By

Index Terms

Non-affine Extensions to Polyhedral Code Generation
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages

Recommendations

Loop and data transformations for sparse matrix code
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and ...
Polyhedral parallel code generation for CUDA
Special Issue on High-Performance Embedded Architectures and Compilers

This article addresses the compilation of a sequential program for parallel execution on a modern GPU. To this end, we present a novel source-to-source compiler called PPCG. PPCG singles out for its ability to accelerate computations from any static ...
Non-affine Extensions to Polyhedral Code Generation
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

This paper describes a loop transformation framework that extends a polyhedral representation of loop nests to represent and transform computations with non-affine index arrays in loop bounds and subscripts via a new interface between compile-time and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

February 2014

328 pages

ISBN:9781450326704

DOI:10.1145/2581122

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Refereed limited

Conference

CGO '14

Sponsor:

CGO '14: 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization

February 15 - 19, 2014

FL, Orlando, USA

Acceptance Rates

CGO '14 Paper Acceptance Rate 29 of 100 submissions, 29%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
137
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Laird ALiu BBjørner NDehnavi M(2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656445
Xu JSong GZhou BLi FHao JZhao JLee IChabbi MSteuwer M(2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638484
Yesil SHeidarshenas AMorrison ATorrellas JDehnavi MKulkarni MKrishnamoorthy S(2023)WISEProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577506(329-341)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577506
Zhao JBastoul CYi YHu JNie WZhang RGeng ZLi CTachon TGan ZKloeckner AMoreira J(2022)Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions AtomicallyProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569656(451-466)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569656
Niu WGuan JWang YAgrawal GRen BFreund SYahav E(2021)DNNFusion: accelerating deep neural networks execution with advanced operator fusionProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454083(883-898)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454083
Yesil SHeidarshenas AMorrison ATorrellas JCuicchi CQualters IKramer W(2020)Speeding up SpMV for power-law graph analytics by enhancing locality & vectorizationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433815(1-15)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433815
Yesil SHeidarshenas AMorrison ATorrellas J(2020)Speeding Up SpMV for Power-Law Graph Analytics by Enhancing Locality & VectorizationSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00090(1-15)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00090
Fousek J(2018)Efficient sparse matrix-delayed vector multiplication for discretized neural field modelThe Journal of Supercomputing10.1007/s11227-017-2194-474:5(1863-1884)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s11227-017-2194-4
Cheshmi KKamil SStrout MDehnavi MMohr BRaghavan P(2017)SympilerProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126936(1-13)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126936
Sampaio DPouchet LRastello FGropp WBeckman PLi ZCazorla F(2017)Simplification and runtime resolution of data dependence constraints for loop transformationsProceedings of the International Conference on Supercomputing10.1145/3079079.3079098(1-11)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079098
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents