skip to main content
10.5555/3314872.3314894acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Tensor algebra compilation with workspaces

Published: 16 February 2019 Publication History

Abstract

This paper shows how to extend sparse tensor algebra compilers to introduce temporary tensors called workspaces to avoid inefficient sparse data structures accesses. We develop an intermediate representation (IR) for tensor operations called concrete index notation that specifies when sub-computations occur and where they are stored. We then describe the workspace transformation in this IR, how to programmatically invoke it, and how the IR is compiled to sparse code. Finally, we show how the transformation can be used to optimize sparse tensor kernels, including sparse matrix multiplication, sparse tensor addition, and the matricized tensor times Khatri-Rao product (MTTKRP).
Our results show that the workspace transformation brings the performance of these kernels on par with hand-optimized implementations. For example, we improve the performance of MTTKRP with dense output by up to 35%, and enable generating sparse matrix multiplication and MTTKRP with sparse output, neither of which were supported by prior tensor algebra compilers.

References

[1]
A. J. Bik and H. A. Wijshoff, “Compilation techniques for sparse matrix computations,” in Proceedings of the 7th international conference on Supercomputing. ACM, 1993, pp. 416–424.
[2]
V. Kotlyar, K. Pingali, and P. Stodghill, “A relational approach to the compilation of sparse matrix programs,” in Euro-Par’97 Parallel Processing. Springer, 1997, pp. 318–327.
[3]
A. Venkat, M. Hall, and M. Strout, “Loop and data transformations for sparse matrix code,” in Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2015, 2015, pp. 521–532.
[4]
F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, “The tensor algebra compiler,” Proc. ACM Program. Lang., vol. 1, no. OOPSLA, pp. 77:1–77:29, Oct. 2017.
[5]
S. Chou, F. Kjolstad, and S. Amarasinghe, “Format abstraction for sparse tensor algebra compilers,” Proc. ACM Program. Lang., vol. 2, no. OOPSLA, pp. 123:1–123:30, Oct. 2018.
[6]
F. G. Gustavson, “Two fast algorithms for sparse matrices: Multiplication and permuted transposition,” ACM Trans. Math. Softw., vol. 4, no. 3, 1978.
[7]
S. Smith, N. Ravindran, N. Sidiropoulos, and G. Karypis, “Splatt: Efficient and parallel sparse tensor-matrix multiplication,” in 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2015, pp. 61–70.
[8]
Intel, “Intel math kernel library reference manual,” 630813-051US, 2012. http://software.intel.com/sites/products/documentation/hpc/mkl/ mklman/mklman.pdf, Tech. Rep., 2012.
[9]
G. Guennebaud, B. Jacob et al., “Eigen v3,” http://eigen.tuxfamily.org, 2010.
[10]
MATLAB, version 8.3.0 (R2014a). Natick, Massachusetts: The MathWorks Inc., 2014.
[11]
J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, “Decoupling algorithms from schedules for easy optimization of image processing pipelines,” ACM Trans. Graph., vol. 31, no. 4, pp. 32:1–32:12, Jul. 2012.
[12]
M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. Pudov, V. O. Pirogov, and P. Dubey, “Parallel efficient sparse matrix-matrix multiplication on multicore platforms,” in International Conference on High Performance Computing. Springer, 2015, pp. 48–57.
[13]
N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, “Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions,” Feb. 2018.
[14]
E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, “A massively parallel tensor contraction framework for coupled-cluster computations,” Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3176–3190, Dec. 2014.
[15]
M. M. Strout, G. Georg, and C. Olschanowsky, “Set and Relation Manipulation for the Sparse Polyhedral Framework,” in Languages and Compilers for Parallel Computing, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, Sep. 2012, pp. 61–75.
[16]
M. Belaoucha, D. Barthou, A. Eliche, and S.-A.-A. Touati, “FADAlib: an open source C++ library for fuzzy array dataflow analysis,” Procedia Computer Science, vol. 1, no. 1, pp. 2075–2084, May 2010.
[17]
M. T. Heath, E. Ng, and B. W. Peyton, “Parallel algorithms for sparse linear systems,” SIAM review, vol. 33, no. 3, pp. 420–460, 1991.
[18]
F. L. Hitchcock, “The expression of a tensor or a polyadic as a sum of products,” Studies in Applied Mathematics, vol. 6, no. 1-4, pp. 164–189, 1927.
[19]
A. Cichocki, “Era of big data processing: A new approach via tensor networks and tensor decompositions,” arXiv preprint arXiv:1403.2048, 2014.
[20]
A. H. Phan and A. Cichocki, “Tensor decompositions for feature extraction and classification of high dimensional datasets,” Nonlinear theory and its applications, IEICE, vol. 1, no. 1, pp. 37–68, 2010.
[21]
J. Möcks, “Topographic components model for event-related potentials and some biophysical considerations,” IEEE transactions on biomedical engineering, vol. 35, no. 6, pp. 482–484, 1988.
[22]
A. Shashua and A. Levin, “Linear image coding for regression and classification using the tensor-rank principle,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.
[23]
T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009.
[24]
T. A. Davis and Y. Hu, “The university of florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, no. 1, Dec. 2011.
[25]
S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis. (2017) FROSTT: The formidable repository of open sparse tensors and tools. {Online}. Available: http://frostt.io/
[26]
J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc, “Model-driven sparse cp decomposition for higher-order tensors,” in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), vol. 00, May 2017, pp. 1048–1057.
[27]
S. Smith, A. Beri, and G. Karypis, “Constrained tensor factorization with accelerated ao-admm,” in Parallel Processing (ICPP), 2017 46th International Conference on. IEEE, 2017, pp. 111–120.
[28]
K. E. Iverson, A Programming Language. Wiley, 1962.
[29]
M. J. Wolfe, “Optimizing supercompilers for supercomputers,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, Champaign, IL, USA, 1982, aAI8303027.
[30]
K. S. McKinley, S. Carr, and C.-W. Tseng, “Improving data locality with loop transformations,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 18, no. 4, pp. 424–453, 1996.
[31]
A. A. Auer, G. Baumgartner, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C.-C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, “Automatic code generation for many-body electronic structure methods: the tensor contraction engine,” Molecular Physics, vol. 104, no. 2, pp. 211–228, 2006.
[32]
W. Pugh and T. Shpeisman, “Sipr: A new framework for generating efficient code for sparse matrix computations,” in Languages and Compilers for Parallel Computing. Springer, 1999, pp. 213–229.
[33]
D. Lugato, F. Kjolstad, S. Chou, S. Amarasinghe, and S. Kamil, “Taco: Compilation et génération de code dexpressions tensorielles,” AVANC ÉES, p. 52, 2018.
[34]
F. Kjolstad, S. Chou, D. Lugato, S. Kamil, and S. Amarasinghe, “taco: A tool to generate tensor algebra kernels,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 2017, pp. 943–948.
[35]
J. Backus, “The history of fortran i, ii, and iii,” in History of programming languages I. ACM, 1978, pp. 25–74.
[36]
Y. Ding and X. Shen, “Glore: Generalized loop redundancy elimination upon ler-notation,” Proc. ACM Program. Lang., vol. 1, no. OOPSLA, pp. 74:1–74:28, Oct. 2017.
[37]
A. Venkat, M. S. Mohammadi, J. Park, H. Rong, R. Barik, M. M. Strout, and M. Hall, “Automating wavefront parallelization for sparse matrix computations,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2016, p. 41.
[38]
M. M. Strout, M. Hall, and C. Olschanowsky, “The sparse polyhedral framework: Composing compiler-generated inspector-executor code,” Proceedings of the IEEE, no. 99, pp. 1–15, 2018.
[39]
H. L. Van Der Spek and H. A. Wijshoff, “Sublimation: expanding data structures to enable data instance specific optimizations,” in Languages and Compilers for Parallel Computing. Springer, 2011, pp. 106–120.
[40]
S. Pissanetzky, Sparse Matrix Technology-electronic edition. Academic Press, 1984.
[41]
J. R. Gilbert, C. Moler, and R. Schreiber, “Sparse matrices in matlab: Design and implementation,” SIAM Journal on Matrix Analysis and Applications, vol. 13, no. 1, pp. 333–356, 1992.
[42]
A. Buluc¸, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, “Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks,” in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. ACM, 2009, pp. 233–244.
[43]
E.-J. Im and K. Yelick, “Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY,” in Computational Science ICCS 2001, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, May 2001, pp. 127–136.
[44]
R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee, “Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply.” IEEE, 2002, pp. 26–26.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO 2019: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization
February 2019
286 pages
ISBN:9781728114361

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 February 2019

Check for updates

Author Tags

  1. code optimization
  2. concrete index notation
  3. sparse tensor algebra
  4. temporaries
  5. workspaces

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)2
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media