skip to main content
research-article
Open access

Mosaic: An Interoperable Compiler for Tensor Algebra

Published: 06 June 2023 Publication History

Abstract

We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to external functions of other tensor algebra libraries and compilers. Users can extend Mosaic by adding new functions and bind a sub-expression to a function using a scheduling API. Mosaic substitutes the bound sub-expressions with calls to the external functions and automatically generates the remaining code using a default code generator. As the generated code is fused by default, users can productively leverage both fusion and calls to specialized functions within the same compiler. We demonstrate the benefits of our dual approach by showing that calling hand-written CPU and specialized hardware functions can provide speedups of up to 206× against fused code in some cases, while generating fused code can provide speedups of up to 3.57× against code that calls external functions in other cases. Mosaic also offers a search system that can automatically map an expression to a set of registered external functions. Both the explicit binding and automatic search are verified by Mosaic. Additionally, the interface for adding new external functions is simple and general. Currently, 38 external functions have been added to Mosaic, with each addition averaging 20 lines of code.

References

[1]
Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019. Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. Graph., 38, 4 (2019), Article 121, jul, 12 pages. issn:0730-0301 https://doi.org/10.1145/3306346.3322967
[2]
Peter Ahrens, Fredrik Kjolstad, and Saman Amarasinghe. 2022. Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 269–285. isbn:9781450392655 https://doi.org/10.1145/3519939.3523442
[3]
Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, and Jonathan Ragan-Kelley. 2021. Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 109, October, 28 pages. https://doi.org/10.1145/3485486
[4]
Brett W. Bader and Tamara G. Kolda. 2006. Algorithm 862: MATLAB Tensor Classes for Fast Algorithm Prototyping. ACM Trans. Math. Softw., 32, 4 (2006), dec, 635–653. issn:0098-3500 https://doi.org/10.1145/1186785.1186794
[5]
Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB Computations with Sparse and Factored Tensors. SIAM J. Sci. Comput., 30, 1 (2007), dec, 205–231. issn:1064-8275 https://doi.org/10.1137/060676489
[6]
Manya Bansal, Olivia Hsu, Kunle Olukotun, and Fredrik Kjolstad. 2023. Artifact for Mosaic: An Interoperable Compiler for Tensor Algebra. https://doi.org/10.5281/zenodo.7814275
[7]
Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah. 2017. Julia: A Fresh Approach to Numerical Computing. SIAM Rev., 59, 1 (2017), 65–98. https://doi.org/10.1137/141000671 arxiv:https://doi.org/10.1137/141000671.
[8]
Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., 19, 4 (2022), Article 50, sep, 25 pages. issn:1544-3566 https://doi.org/10.1145/3544559
[9]
Chun Chen, Jacqueline Chame, and Mary W. Hall. 2007. CHiLL : A Framework for Composing High-Level Loop Transformations.
[10]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, USA. 579–594. isbn:9781931971478
[11]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2018. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. https://doi.org/10.48550/ARXIV.1807.07928
[12]
Jack Choquette, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. NVIDIA A100 Tensor Core GPU: Performance and Innovation. IEEE Micro, 41, 2 (2021), 29–35. https://doi.org/10.1109/MM.2021.3061394
[13]
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, oct, 30 pages. https://doi.org/10.1145/3276493
[14]
Vidushi Dadu, Jian Weng, Sihao Liu, and Tony Nowatzki. 2019. Towards General Purpose Acceleration by Exploiting Common Data-Dependence Forms. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52). Association for Computing Machinery, New York, NY, USA. 924–939. isbn:9781450369381 https://doi.org/10.1145/3352460.3358276
[15]
Steven Dalton, Nathan Bell, Luke Olson, and Michael Garland. 2014. Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. http://cusplibrary.github.io/ Version 0.5.0.
[16]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, dec, 25 pages. issn:0098-3500 https://doi.org/10.1145/2049662.2049663
[17]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidelberg. 337–340. isbn:3540787992
[18]
John R Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse matrices in MATLAB: Design and implementation. SIAM journal on matrix analysis and applications, 13, 1 (1992), 333–356.
[19]
Brian Gough. 2009. GNU scientific library reference manual. Network Theory Ltd.
[20]
Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw., 4, 3 (1978).
[21]
Bastian Hagedorn, Johannes Lenfers, Thomas Koehler, Sergei Gorlatch, and Michel Steuwer. 2020. A Language for Describing Optimization Strategies. arxiv:2002.02268.
[22]
Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, and Trevor Mudge. 2020. Sparse-TPU: Adapting Systolic Arrays for Sparse Matrices. Association for Computing Machinery, New York, NY, USA. isbn:9781450379830 https://doi.org/10.1145/3392717.3392751
[23]
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319–333.
[24]
Olivia Hsu, Alexander Rucker, Tian Zhao, Kunle Olukotun, and Fredrik Kjolstad. 2022. Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture. https://doi.org/10.48550/ARXIV.2211.03251
[25]
Olivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel S. Emer, Mark A. Horowitz, and Fredrik Kjølstad. 2023. The Sparse Abstract Machine. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA. 710–726. isbn:9781450399180 https://doi.org/10.1145/3582016.3582051
[26]
Jianyu Huang, Devin A. Matthews, and Robert A. van de Geijn. 2017. Strassen’s Algorithm for Tensor Contraction. CoRR, abs/1704.03092 (2017), arXiv:1704.03092. arxiv:1704.03092
[27]
Yuka Ikarashi, Gilbert Louis Bernstein, Alex Reinking, Hasan Genc, and Jonathan Ragan-Kelley. 2022. Exocompilation for Productive Programming of Hardware Accelerators. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 703–718. isbn:9781450392655 https://doi.org/10.1145/3519939.3523446
[28]
2009. Intel Math Kernel Library. Reference Manual. Intel Corporation, Santa Clara. isbn:630813-054US
[29]
2011. Intel Advanced Vector Extensions Programming Reference. Intel Corporation, Santa Clara, USA. https://www.intel.com/content/dam/develop/external/us/en/documents/36945
[30]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. SIGARCH Comput. Archit. News, 45, 2 (2017), jun, 1–12. issn:0163-5964 https://doi.org/10.1145/3140659.3080246
[31]
Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Comput. Archit. Lett., 15, 1 (2016), Jan., 45–49. issn:1556-6056 https://doi.org/10.1109/LCA.2015.2414456
[32]
Fredrik Kjølstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 180–192. https://doi.org/10.1109/CGO.2019.8661185
[33]
Fredrik Kjølstad, Stephen Chou, and Saman Amarasinghe. 2022. Taco: The tensor algebra compiler. http://tensor-compiler.org/
[34]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, oct, 29 pages. https://doi.org/10.1145/3133901
[35]
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A Language and Compiler for Application Accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 296–311. isbn:9781450356985 https://doi.org/10.1145/3192366.3192379
[36]
Tamara G. Kolda and Brett W. Bader. 2006. MATLAB Tensor Toolbox, Version 00. https://www.osti.gov/biblio/1230898
[37]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer, 42, 8 (2009), 30–37. https://doi.org/10.1109/MC.2009.263
[38]
Chuck L Lawson, Richard J. Hanson, David R Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software (TOMS), 5, 3 (1979), 308–323.
[39]
P Vandermersch M Naumov, LS Chien. 2010. Cusparse library. GPU Technology Conference (GTC).
[40]
MATLAB. 2010. version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts.
[41]
Devin Matthews. 2016. High-Performance Tensor Contraction without BLAS. CoRR, abs/1607.00291 (2016), arXiv:1607.00291. arxiv:1607.00291
[42]
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. ACM Trans. Graph., 35, 4 (2016), Article 83, jul, 11 pages. issn:0730-0301 https://doi.org/10.1145/2897824.2925952
[43]
Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2020. COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry. In Languages and Compilers for Parallel Computing: 33rd International Workshop, LCPC 2020, Virtual Event, October 14-16, 2020, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 87–103. isbn:978-3-030-95952-4 https://doi.org/10.1007/978-3-030-95953-1_7
[44]
NVIDIA. 2022. CUTLASS. https://developer.nvidia.com/blog/cutlass-linear-algebra-cuda/
[45]
NVIDIA. 2022. Tensor Cores. https://www.nvidia.com/en-us/data-center/tensor-cores/
[46]
Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 724–736. https://doi.org/10.1109/HPCA.2018.00067
[47]
Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 58–70. https://doi.org/10.1109/HPCA47549.2020.00015
[48]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Trans. Graph., 31, 4 (2012), Article 32, July, 12 pages. issn:0730-0301 https://doi.org/10.1145/2185520.2185528
[49]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). Association for Computing Machinery, New York, NY, USA. 519–530. isbn:9781450320146 https://doi.org/10.1145/2491956.2462176
[50]
Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, and Kunle Olukotun. 2021. Capstan: A Vector RDA for Sparsity. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA. 1022–1035. isbn:9781450385572 https://doi.org/10.1145/3466752.3480047
[51]
Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjølstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, Nov., 30 pages. https://doi.org/10.1145/3428226
[52]
Navjot Singh, Zecheng Zhang, Xiaoxiao Wu, Naijing Zhang, Siyuan Zhang, and Edgar Solomonik. 2022. Distributed-memory tensor completion for generalized loss functions in python using new sparse tensor kernels. J. Parallel Distributed Comput., 169 (2022), 269–285. https://doi.org/10.1016/j.jpdc.2022.07.005
[53]
Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. arxiv:1512.00066.
[54]
Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput., 74, 12 (2014), 3176–3190. issn:0743-7315 https://doi.org/10.1016/j.jpdc.2014.06.002
[55]
Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 766–780. https://doi.org/10.1109/MICRO50266.2020.00068
[56]
Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 689–702. https://doi.org/10.1109/HPCA47549.2020.00062
[57]
Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanaël Prémillieu, Alastair Reid, Alejandro Rico, and Paul Walker. 2018. The ARM Scalable Vector Extension. CoRR, abs/1803.06185 (2018), arXiv:1803.06185. arxiv:1803.06185
[58]
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE, 106, 11 (2018), 1921–1934. https://doi.org/10.1109/JPROC.2018.2857721
[59]
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High Performance Sparse Tensor Algebra Compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 27–38. https://doi.org/10.1109/LLVMHPC54804.2021.00009
[60]
Field G. Van Zee and Robert A. van de Geijn. 2015. BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM Trans. Math. Softw., 41, 3 (2015), Article 14, jun, 33 pages. issn:0098-3500 https://doi.org/10.1145/2764454
[61]
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 521–532. isbn:9781450334686 https://doi.org/10.1145/2737924.2738003
[62]
R. Clint Whaley and Antoine Petitet. 2005. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience, 35, 2 (2005), 101–121. https://doi.org/10.1002/spe.626 arxiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.626.
[63]
Rohan Yadav, Alex Aiken, and Fredrik Kjolstad. 2022. DISTAL: The Distributed Tensor Algebra Compiler. 286–300. isbn:9781450392655 https://doi.org/10.1145/3519939.3523437
[64]
Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2023. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA. 660–678. isbn:9781450399180 https://doi.org/10.1145/3582016.3582047
[65]
Qing Yi. 2012. POET: A Scripting Language for Applying Parameterized Source-to-Source Program Transformations. Softw. Pract. Exper., 42, 6 (2012), jun, 675–706. issn:0038-0644 https://doi.org/10.1002/spe.1089
[66]
Guowei Zhang, Nithya Attaluri, Joel S. Emer, and Daniel Sanchez. 2021. Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21). Association for Computing Machinery, New York, NY, USA. 687–701. isbn:9781450383172 https://doi.org/10.1145/3445814.3446702
[67]
Yaqi Zhang, Alexander Rucker, Matthew Vilim, Raghu Prabhakar, William Hwang, and Kunle Olukotun. 2019. Scalable Interconnects for Reconfigurable Spatial Architectures. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). Association for Computing Machinery, New York, NY, USA. 615–628. isbn:9781450366694 https://doi.org/10.1145/3307650.3322249
[68]
Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, and Kunle Olukotun. 2021. SARA: Scaling a Reconfigurable Dataflow Accelerator. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 1041–1054. https://doi.org/10.1109/ISCA52012.2021.00085
[69]
Tuowen Zhao, Tobi Popoola, Mary Hall, Catherine Olschanowsky, and Michelle Strout. 2022. Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-Iteration. ACM Trans. Archit. Code Optim., 20, 1 (2022), Article 16, dec, 26 pages. issn:1544-3566 https://doi.org/10.1145/3566054
[70]
Size Zheng, Renze Chen, Anjiang Wei, Yicheng Jin, Qin Han, Liqiang Lu, Bingyang Wu, Xiuhong Li, Shengen Yan, and Yun Liang. 2022. AMOS: Enabling <U>A</U>Utomatic <U>M</U>Apping for Tensor Computations <U>O</U>n <U>S</U>Patial Accelerators with Hardware Abstraction. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA ’22). Association for Computing Machinery, New York, NY, USA. 874–887. isbn:9781450386104 https://doi.org/10.1145/3470496.3527440

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 7, Issue PLDI
June 2023
2020 pages
EISSN:2475-1421
DOI:10.1145/3554310
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023
Published in PACMPL Volume 7, Issue PLDI

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. automated search
  2. compilation
  3. external functions
  4. sparse tensor algebra

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,892
  • Downloads (Last 6 weeks)147
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media