Article

Tensor algebra compilation with workspaces

Authors:

Fredrik Kjolstad,

Saman AmarasingheAuthors Info & Claims

CGO 2019: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization

Pages 180 - 192

Published: 16 February 2019 Publication History

Abstract

This paper shows how to extend sparse tensor algebra compilers to introduce temporary tensors called workspaces to avoid inefficient sparse data structures accesses. We develop an intermediate representation (IR) for tensor operations called concrete index notation that specifies when sub-computations occur and where they are stored. We then describe the workspace transformation in this IR, how to programmatically invoke it, and how the IR is compiled to sparse code. Finally, we show how the transformation can be used to optimize sparse tensor kernels, including sparse matrix multiplication, sparse tensor addition, and the matricized tensor times Khatri-Rao product (MTTKRP).

Our results show that the workspace transformation brings the performance of these kernels on par with hand-optimized implementations. For example, we improve the performance of MTTKRP with dense output by up to 35%, and enable generating sparse matrix multiplication and MTTKRP with sparse output, neither of which were supported by prior tensor algebra compilers.

References

[1]

A. J. Bik and H. A. Wijshoff, “Compilation techniques for sparse matrix computations,” in Proceedings of the 7th international conference on Supercomputing. ACM, 1993, pp. 416–424.

Digital Library

[2]

V. Kotlyar, K. Pingali, and P. Stodghill, “A relational approach to the compilation of sparse matrix programs,” in Euro-Par’97 Parallel Processing. Springer, 1997, pp. 318–327.

Digital Library

[3]

A. Venkat, M. Hall, and M. Strout, “Loop and data transformations for sparse matrix code,” in Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2015, 2015, pp. 521–532.

Digital Library

[4]

F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, “The tensor algebra compiler,” Proc. ACM Program. Lang., vol. 1, no. OOPSLA, pp. 77:1–77:29, Oct. 2017.

Digital Library

[5]

S. Chou, F. Kjolstad, and S. Amarasinghe, “Format abstraction for sparse tensor algebra compilers,” Proc. ACM Program. Lang., vol. 2, no. OOPSLA, pp. 123:1–123:30, Oct. 2018.

Digital Library

[6]

F. G. Gustavson, “Two fast algorithms for sparse matrices: Multiplication and permuted transposition,” ACM Trans. Math. Softw., vol. 4, no. 3, 1978.

Digital Library

[7]

S. Smith, N. Ravindran, N. Sidiropoulos, and G. Karypis, “Splatt: Efficient and parallel sparse tensor-matrix multiplication,” in 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015, pp. 61–70.

Digital Library

[8]

Intel, “Intel math kernel library reference manual,” 630813-051US, 2012. http://software.intel.com/sites/products/documentation/hpc/mkl/ mklman/mklman.pdf, Tech. Rep., 2012.

[9]

G. Guennebaud, B. Jacob et al., “Eigen v3,” http://eigen.tuxfamily.org, 2010.

[10]

MATLAB, version 8.3.0 (R2014a). Natick, Massachusetts: The MathWorks Inc., 2014.

[11]

J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, “Decoupling algorithms from schedules for easy optimization of image processing pipelines,” ACM Trans. Graph., vol. 31, no. 4, pp. 32:1–32:12, Jul. 2012.

Digital Library

[12]

M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. Pudov, V. O. Pirogov, and P. Dubey, “Parallel efficient sparse matrix-matrix multiplication on multicore platforms,” in International Conference on High Performance Computing. Springer, 2015, pp. 48–57.

[13]

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, “Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions,” Feb. 2018.

[14]

E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, “A massively parallel tensor contraction framework for coupled-cluster computations,” Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3176–3190, Dec. 2014.

Digital Library

[15]

M. M. Strout, G. Georg, and C. Olschanowsky, “Set and Relation Manipulation for the Sparse Polyhedral Framework,” in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, Sep. 2012, pp. 61–75.

[16]

M. Belaoucha, D. Barthou, A. Eliche, and S.-A.-A. Touati, “FADAlib: an open source C++ library for fuzzy array dataflow analysis,” Procedia Computer Science, vol. 1, no. 1, pp. 2075–2084, May 2010.

[17]

M. T. Heath, E. Ng, and B. W. Peyton, “Parallel algorithms for sparse linear systems,” SIAM review, vol. 33, no. 3, pp. 420–460, 1991.

Digital Library

[18]

F. L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” Studies in Applied Mathematics, vol. 6, no. 1-4, pp. 164–189, 1927.

[19]

A. Cichocki, “Era of big data processing: A new approach via tensor networks and tensor decompositions,” arXiv preprint arXiv:1403.2048, 2014.

[20]

A. H. Phan and A. Cichocki, “Tensor decompositions for feature extraction and classification of high dimensional datasets,” Nonlinear theory and its applications, IEICE, vol. 1, no. 1, pp. 37–68, 2010.

[21]

J. Möcks, “Topographic components model for event-related potentials and some biophysical considerations,” IEEE transactions on biomedical engineering, vol. 35, no. 6, pp. 482–484, 1988.

[22]

A. Shashua and A. Levin, “Linear image coding for regression and classification using the tensor-rank principle,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.

[23]

T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.

Digital Library

[24]

T. A. Davis and Y. Hu, “The university of florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, no. 1, Dec. 2011.

Digital Library

[25]

S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis. (2017) FROSTT: The formidable repository of open sparse tensors and tools. {Online}. Available: http://frostt.io/

[26]

J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc, “Model-driven sparse cp decomposition for higher-order tensors,” in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), vol. 00, May 2017, pp. 1048–1057.

[27]

S. Smith, A. Beri, and G. Karypis, “Constrained tensor factorization with accelerated ao-admm,” in Parallel Processing (ICPP), 2017 46th International Conference on. IEEE, 2017, pp. 111–120.

[28]

K. E. Iverson, A Programming Language. Wiley, 1962.

Digital Library

[29]

M. J. Wolfe, “Optimizing supercompilers for supercomputers,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, Champaign, IL, USA, 1982, aAI8303027.

Digital Library

[30]

K. S. McKinley, S. Carr, and C.-W. Tseng, “Improving data locality with loop transformations,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 18, no. 4, pp. 424–453, 1996.

Digital Library

[31]

A. A. Auer, G. Baumgartner, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C.-C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, “Automatic code generation for many-body electronic structure methods: the tensor contraction engine,” Molecular Physics, vol. 104, no. 2, pp. 211–228, 2006.

[32]

W. Pugh and T. Shpeisman, “Sipr: A new framework for generating efficient code for sparse matrix computations,” in Languages and Compilers for Parallel Computing. Springer, 1999, pp. 213–229.

Digital Library

[33]

D. Lugato, F. Kjolstad, S. Chou, S. Amarasinghe, and S. Kamil, “Taco: Compilation et génération de code dexpressions tensorielles,” AVANC ÉES, p. 52, 2018.

[34]

F. Kjolstad, S. Chou, D. Lugato, S. Kamil, and S. Amarasinghe, “taco: A tool to generate tensor algebra kernels,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 2017, pp. 943–948.

Digital Library

[35]

J. Backus, “The history of fortran i, ii, and iii,” in History of programming languages I. ACM, 1978, pp. 25–74.

Digital Library

[36]

Y. Ding and X. Shen, “Glore: Generalized loop redundancy elimination upon ler-notation,” Proc. ACM Program. Lang., vol. 1, no. OOPSLA, pp. 74:1–74:28, Oct. 2017.

Digital Library

[37]

A. Venkat, M. S. Mohammadi, J. Park, H. Rong, R. Barik, M. M. Strout, and M. Hall, “Automating wavefront parallelization for sparse matrix computations,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2016, p. 41.

Digital Library

[38]

M. M. Strout, M. Hall, and C. Olschanowsky, “The sparse polyhedral framework: Composing compiler-generated inspector-executor code,” Proceedings of the IEEE, no. 99, pp. 1–15, 2018.

[39]

H. L. Van Der Spek and H. A. Wijshoff, “Sublimation: expanding data structures to enable data instance specific optimizations,” in Languages and Compilers for Parallel Computing. Springer, 2011, pp. 106–120.

Digital Library

[40]

S. Pissanetzky, Sparse Matrix Technology-electronic edition. Academic Press, 1984.

[41]

J. R. Gilbert, C. Moler, and R. Schreiber, “Sparse matrices in matlab: Design and implementation,” SIAM Journal on Matrix Analysis and Applications, vol. 13, no. 1, pp. 333–356, 1992.

Digital Library

[42]

A. Buluc¸, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, “Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks,” in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. ACM, 2009, pp. 233–244.

Digital Library

[43]

E.-J. Im and K. Yelick, “Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY,” in Computational Science ICCS 2001, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, May 2001, pp. 127–136.

Digital Library

[44]

R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee, “Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply.” IEEE, 2002, pp. 26–26.

Cited By

Dangi PBai ZJuneja RWijerathne DMitra T(2024)ZeD: A Generalized Accelerator for Variably Sparse Matrix Computations in MLProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689905(246-257)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689905
Schleich MShaikhha ASuciu D(2023)Optimizing Tensor Programs on Flexible StorageProceedings of the ACM on Management of Data10.1145/35887171:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588717
Zhou TTian RAshraf RGioiosa RKestor GSarkar VKloeckner AMoreira J(2022)ReACTProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569685(1-13)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569685
Show More Cited By

Index Terms

Tensor algebra compilation with workspaces

Index terms have been assigned to the content through auto-classification.

Recommendations

Automatic generation of efficient sparse tensor format conversion routines
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping,...
Compilation of Modular and General Sparse Workspaces

Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result tensor. We ...
Compilation of dynamic sparse tensor algebra

Many applications, from social network graph analytics to control flow analysis, compute on sparse data that evolves over the course of program execution. Such data can be represented as dynamic sparse tensors and efficiently stored in formats (data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO 2019: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization

February 2019

286 pages

ISBN:9781728114361

General Chair:
Mahmut Taylan Kandemir
Penn State University, USA
,
Program Chairs:
Alexandra Jimborean
Uppsala University, USA
,
Tipp Moseley
Google, USA

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 16 February 2019

Check for updates

Author Tags

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
204
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dangi PBai ZJuneja RWijerathne DMitra T(2024)ZeD: A Generalized Accelerator for Variably Sparse Matrix Computations in MLProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689905(246-257)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3689905
Schleich MShaikhha ASuciu D(2023)Optimizing Tensor Programs on Flexible StorageProceedings of the ACM on Management of Data10.1145/35887171:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588717
Zhou TTian RAshraf RGioiosa RKestor GSarkar VKloeckner AMoreira J(2022)ReACTProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569685(1-13)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569685
Liu YTseng H(2021)NDS: N-Dimensional StorageMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480122(28-45)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480122

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten