skip to main content
10.1145/3649153.3649187acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Open access

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU

Published: 02 July 2024 Publication History

Abstract

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation, including (1) eliminating global atomic operations across GPU thread blocks, (2) avoiding the intermediate values being communicated between GPU thread blocks and GPU global memory, and (3) ensuring a balanced distribution of workloads across GPU thread blocks. Our approach also supports dynamic tensor remapping, enabling the above optimizations in all the modes of the input tensor. Our approach achieves a geometric mean speedup of 1.5×, 2.0×, and 21.7× in total execution time across widely used datasets compared with the state-of-the-art GPU implementations. Our work is the only GPU implementation that can support tensors with modes greater than 4 since the state-of-the-art works have implementation constraints for tensors with a large number of modes.

References

[1]
Richard Ansorge. 2022. Programming in parallel with CUDA: a practical guide. Cambridge University Press.
[2]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 1247--1250. https://doi.org/10.1145/1376616.1376746
[3]
Rohit Chandra. 2001. Parallel programming in OpenMP. Morgan kaufmann.
[4]
Zhiyu Cheng, Baopu Li, Yanwen Fan, and Yingze Bao. 2020. A novel rank selection scheme in tensor ring decomposition based on reinforcement learning for deep neural networks. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3292--3296.
[5]
Shane Cook. 2012. CUDA programming: a developer's guide to parallel computing with GPUs. Newnes.
[6]
Massimiliano Fatica. 2008. CUDA toolkit and libraries. In 2008 IEEE hot chips 20 symposium (HCS). IEEE, 1--22.
[7]
Gérard Favier and André LF de Almeida. 2014. Overview of constrained PARAFAC models. EURASIP Journal on Advances in Signal Processing 2014, 1 (2014), 1--25.
[8]
Sofia Fernandes, Hadi Fanaee-T, and João Gama. 2020. Tensor decomposition for analysing time-evolving social networks: An overview. Artificial Intelligence Review (2020), 1--26.
[9]
Ronald L. Graham. 1969. Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics 17, 2 (1969), 416--429.
[10]
Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web (Montréal, Québec, Canada) (WWW '16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 507--517. https://doi.org/10.1145/2872427.2883037
[11]
Kumar Iyer and Jeffrey Kiel. 2016. GPU debugging and Profiling with NVIDIA Parallel Nsight. Game Development Tools (2016), 303--324.
[12]
Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455--500.
[13]
Jiajia Li, Yuchen Ma, and Richard Vuduc. 2018. ParTI!: A parallel tensor infrastructure for multicore CPUs and GPUs. A parallel tensor infrastructure for multicore CPUs and GPUs (2018).
[14]
Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 238--252. https://doi.org/10.1109/SC.2018.00022
[15]
Jiajia Li, Bora Uçar, Ümit V. Çatalyürek, Jimeng Sun, Kevin Barker, and Richard Vuduc. 2019. Efficient and Effective Sparse Tensor Reordering. https://github.com/hpcgarage/ParTI
[16]
Bangtian Liu, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 47--57. https://doi.org/10.1109/CLUSTER.2017.75
[17]
Julian McAuley. 2021. Recommender Systems and Personalization Datasets. https://cseweb.ucsd.edu/~jmcauley/datasets.html#
[18]
Marco Mondelli and Andrea Montanari. 2019. On the connection between learning two-layer neural networks and tensor decomposition. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1051--1060.
[19]
Andy Nguyen, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Yongseok Soh, Teresa Ranadive, Fabrizio Petrini, and Jee W. Choi. 2022. Efficient, out-of-Memory Sparse MTTKRP on Massively Parallel Architectures. In Proceedings of the 36th ACM International Conference on Supercomputing (Virtual Event) (ICS '22). Association for Computing Machinery, New York, NY, USA, Article 26, 13 pages. https://doi.org/10.1145/3524059.3532363
[20]
Andy Nguyen, Ahmed E Helal, Fabio Checconi, Jan Laukemann, Jesmin Jahan Tithi, Yongseok Soh, Teresa Ranadive, Fabrizio Petrini, and Jee W Choi. 2022. Efficient, out-of-memory sparse MTTKRP on massively parallel architectures. https://github.com/jeewhanchoi/blocked-linearized-coordinate
[21]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prasant Singh Rawat, Sriram Krishnamoorthy, and P. Sadayappan. 2019. An Efficient Mixed-Mode Representation of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '19). Association for Computing Machinery, New York, NY, USA, Article 49, 25 pages. https://doi.org/10.1145/3295500.3356216
[22]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prasant Singh Rawat, Sriram Krishnamoorthy, and Ponnuswamy Sadayappan. 2019. An Efficient Mixed-Mode Representation of Sparse Tensors. https://github.com/isratnisa/MM-CSF
[23]
Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Richard Vuduc, and P. Sadayappan. 2019. Load-Balanced Sparse MTTKRP on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 123--133. https://doi.org/10.1109/IPDPS.2019.00023
[24]
Takashi Nishitsuji. 2023. Basics of OpenCL. In Hardware Acceleration of Computational Holography. Springer, 83--95.
[25]
NVIDIA. 2023. DEVELOPER TOOLS Documentation. https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#
[26]
Eric T. Phipps and Tamara G. Kolda. 2019. Software for Sparse Tensor Decomposition on Emerging Computing Architectures. SIAM Journal on Scientific Computing 41, 3 (2019), C269-C290. https://doi.org/10.1137/18M1210691 arXiv:https://doi.org/10.1137/18M1210691
[27]
Jérémie Rappaz, Julian McAuley, and Karl Aberer. 2021. Recommendation on Live-Streaming Platforms: Dynamic Availability and Repeat Consumption. In Proceedings of the 15th ACM Conference on Recommender Systems (Amsterdam, Netherlands) (RecSys '21). Association for Computing Machinery, New York, NY, USA, 390--399. https://doi.org/10.1145/3460231.3474267
[28]
Boris Schäling. 2014. The boost C++ libraries. Vol. 3. XML press Laguna Hills.
[29]
Nicholas D. Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E. Papalexakis, and Christos Faloutsos. 2017. Tensor Decomposition for Signal Processing and Machine Learning. IEEE Transactions on Signal Processing 65, 13 (2017), 3551--3582. https://doi.org/10.1109/TSP.2017.2690524
[30]
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/
[31]
Fuxi Wen, Hing Cheung So, and Henk Wymeersch. 2020. Tensor decomposition-based beamspace esprit algorithm for multidimensional harmonic retrieval. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). IEEE, 4572--4576.
[32]
Sasindu Wijeratne, Rajgopal Kannan, and Viktor Prasanna. 2023. Dynasor: A Dynamic Memory Layout for Accelerating Sparse MTTKRP for Tensor Decomposition on Multi-core CPU. arXiv:2309.09131 [cs.DC]
[33]
Sasindu Wijeratne, Ta-Yang Wang, Rajgopal Kannan, and Viktor Prasanna. 2023. Accelerating Sparse MTTKRP for Tensor Decomposition on FPGA. In Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA, USA) (FPGA '23). Association for Computing Machinery, New York, NY, USA, 259--269. https://doi.org/10.1145/3543622.3573179
[34]
Cyril Zeller. 2011. CUDA C/C++ Basics. (2011).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
May 2024
345 pages
ISBN:9798400705977
DOI:10.1145/3649153
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. GPU
  2. Tensor Decomposition
  3. spMTTKRP

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSF
  • DARPA

Conference

CF '24
Sponsor:

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 190
    Total Downloads
  • Downloads (Last 12 months)190
  • Downloads (Last 6 weeks)42
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media