skip to main content
10.1145/2544137.2544141acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
tutorial

Non-affine Extensions to Polyhedral Code Generation

Published: 16 October 2018 Publication History

Abstract

This paper describes a loop transformation framework that extends a polyhedral representation of loop nests to represent and transform computations with non-affine index arrays in loop bounds and subscripts via a new interface between compile-time and run-time abstractions. Polyhedra scanning code generation, which historically applies an affine mapping to the subscript expressions of the statements in a loop nest, is modified to apply non-affine mappings involving index arrays that are represented at compile time by uninterpreted functions; non-affine loop bounds involving index arrays are also represented. When appropriate, an inspector is utilized to capture the non-affine subscript mappings, and a generalized loop coalescing transformation is introduced as a non-affine transformation to support non-affine loop bounds. With this support, complex sequences of new and existing transformations can then be composed. We demonstrate the effectiveness of this framework by optimizing sparse matrix vector multiplication operations targeting GPUs for different matrix structures and parallelization strategies. This approach achieves performance that is comparable to or greater than the hand-tuned CUSP library; for two of the implementations it achieves an average 1.14× improvement over CUSP across a collection of sparse matrices, while the third performs on average within 8% of CUSP.

References

[1]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2002.
[2]
C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Apr. 1991.
[3]
D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. Journal of Parallel and Distributed Computing, 40(2):210--226, 1997.
[4]
A. Basumallik and R. Eigenmann. Optimizing irregular shared-memory applications for distributed-memory systems. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, 2006.
[5]
N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of SC '09, Nov. 2009.
[6]
M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In Proceedings of the International Conference on Compiler Construction (ETAPS CC'10), LNCS, Paphos, Cyprus, Mar. 2010. Springer-Verlag.
[7]
W. Blume and R. Eigenmann. The range test: a dependence test for symbolic, non-linear expressions. In Proceedings of Supercomputing '94, 1994.
[8]
C. Chen. Polyhedra scanning revisited. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 499--508, June 2012.
[9]
T. Davis. The University of Florida Sparse Matrix Collection. NA Digest, 97, 1997.
[10]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 229--241, New York, NY, USA, May 1999. ACM.
[11]
P. Feautrier. Automatic parallelization in the polytope model. In The Data Parallel Programming Model, pages 79--103, 1996.
[12]
M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural parallelization analysis in suif. ACM Trans. Program. Lang. Syst., 27(4):662--731, July 2005.
[13]
H. Han and C.-W. Tseng. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17(7):606--618, 2006.
[14]
W. A. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, University of Maryland, Dec. 1996.
[15]
M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. A script-based autotuning compiler system to generate high-performance cuda code. ACM Trans. Archit. Code Optim., 9(4):31:1--31:25, Jan. 2013.
[16]
Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, May 2000.
[17]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications using data and computation reorderings. International Journal of Parallel Programming, 29(3):217--247, 2001.
[18]
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the 2nd International Conference on Supercomputing, pages 140--152, 1988.
[19]
N. Mitchell, L. Carter, and J. Ferrante. Localizing non-affine array references. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 192--202, October 1999.
[20]
B. Pottenger and R. Eigenmann. Idiom recognition in the polaris parallelizing compiler. In Proceedings of SC'95, Nov. 1995.
[21]
W. Pugh and D. Wonnacott. Nonlinear array dependence analysis. In Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, May 1995.
[22]
W. Pugh and D. Wonnacott. Constraint-based array dependence analysis. ACM Transactions on Programming Languages and Systems, 20(3):635--678, 1 May 1998.
[23]
F. Quilleré and S. Rajopadhye. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28(5):469--498, Oct. 2000.
[24]
L. Rauchwerger and D. Padua. The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, PLDI '95, 1995.
[25]
M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of SC'12, November 2012.
[26]
S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid analysis: static & dynamic memory reference analysis. International Journal Parallel Programming, 31(4):251--283, 2003.
[27]
J. Saltz, C. Chang, G. Edjlali, Y.-S. Hwang, B. Moon, R. Ponnusamy, S. Sharma, A. Sussman, M. Uysal, G. Agrawal, R. Das, and P. Havlak. Programming irregular applications: Runtime support, compilation and tools. Advances in Computers, 45:105--153, 1997.
[28]
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2003.
[29]
M. M. Strout, G. George, and C. Olschanowsky. Set and relation manipulation for the sparse polyhedral framework. In Proceedings of the 25th International Workshop on Languages and Compilers for Parallel Computing (LCPC), September 2012.
[30]
M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An approach for code generation in the sparse polyhedral framework. Technical Report CS-13-109, Colorado State University, December 2013.
[31]
H. van der Spek and H. Wijshoff. Sublimation: Expanding data structures to enable data instance specific optimizations. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC), Lecture Notes in Computer Science, pages 106--120. Springer Berlin / Heidelberg, 2010.
[32]
N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proceedings of the 15th International Conference on Compiler Construction, Mar. 2006.
[33]
S. Verdoolaege. isl: An integer set library for the polyhedral model. In K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama, editors, Lecture Notes in Computer Science, pages 299--302. Springer, Sept. 2010.
[34]
S. Verdoolaege, J. Carlos Juega, A. Cohen, J. Ignacio Gómez, C. Tenllado, and F. Catthoor. Polyhedral Parallel Code Generation for CUDA. ACM Trans. Archit. Code Optim., 9(4):54:1--54:23, Jan. 2013.
[35]
R. Vuduc, J. W. Demmel, and K. A. Yelick. Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1):521--530, 2005.
[36]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009.
[37]
M. Wolfe. Optimizing Supercompilers for Supercomputers. The MIT Press, 1989.
[38]
B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of pa ral lel programming, PPoPP '13, 2013.

Cited By

View all

Index Terms

  1. Non-affine Extensions to Polyhedral Code Generation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
      February 2014
      328 pages
      ISBN:9781450326704
      DOI:10.1145/2581122

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 October 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. code generation
      2. inspector/executor
      3. loop coalescing
      4. non-affine
      5. polyhedral model

      Qualifiers

      • Tutorial
      • Refereed limited

      Conference

      CGO '14

      Acceptance Rates

      CGO '14 Paper Acceptance Rate 29 of 100 submissions, 29%;
      Overall Acceptance Rate 312 of 1,061 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)25
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media