skip to main content
10.1145/3458817.3476224acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs

Published: 13 November 2021 Publication History

Abstract

Lossless data compression is a promising software approach for reducing the bandwidth requirements of scientific applications on accelerator clusters without introducing approximation errors. Suitable compressors must be able to effectively compact floating-point data while saturating the system interconnect to avoid introducing unnecessary latencies.
We present ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds.
Through the combination of intra-block parallelism and efficient memory access patterns, ndzip-gpu achieves high resource utilization in decorrelating multi-dimensional data via the Integer Lorenzo Transform. We further introduce a novel, efficient warp-cooperative primitive for vertical bit packing, providing a high-throughput data reduction and expansion step.
Using a representative set of scientific data, we compare the performance of ndzip-gpu against five other, existing GPU compressors. While observing that effectiveness of any compressor strongly depends on characteristics of the dataset, we demonstrate that ndzip-gpu offers the best average compression ratio for the examined data. On Nvidia Turing, Volta and Ampere hardware, it achieves the highest single-precision throughput by a significant margin while maintaining a favorable trade-off between data reduction and throughput in the double-precision case.

Supplementary Material

MP4 File (ndzip-gpu_ Efficient Lossless Compression of Scientific Floating-Point Data on GPUs.mp4.mp4)
Presentation video

References

[1]
Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In Proceedings of the International Workshop on OpenCL. 1--1.
[2]
Ana Balevic. 2009. Parallel Variable-Length Encoding on GPGPUs. In European Conference on Parallel Processing. Springer, 26--35.
[3]
Martin Burtscher and Paruj Ratanaworabhan. 2008. FPC: A high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58, 1 (2008), 18--31.
[4]
M. Burtscher and P. Ratanaworabhan. 2009. pFPC: A parallel compressor for floating-point data. In 2009 Data Compression Conference. IEEE, 43--52.
[5]
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of HPC Applications 33, 6 (2019), 1201--1220.
[6]
Aditya Deshpande and PJ Narayanan. 2015. Fast burrows wheeler compression using all-cores. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE, 628--636.
[7]
Shunji Funasaka, Koji Nakano, and Yasuaki Ito. 2015. A parallel algorithm for LZW decompression, with GPU implementation. In International conference on parallel processing and applied mathematics. Springer, 228--237.
[8]
Song Huang, Shucai Xiao, and Wu-chun Feng. 2009. On the Energy Efficiency of Graphics Processing Units for Scientific Computing. In 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 1--8.
[9]
Lawrence Ibarria, Peter Lindstrom, Jarek Rossignac, and Andrzej Szymczak. 2003. Out-of-core compression and decompression of large n-dimensional scalar fields. In Computer Graphics Forum, Vol. 22. Wiley Online Library, 343--348.
[10]
Fabian Knorr, Peter Thoman, and Thomas Fahringer. 2020. Datasets for Benchmarking Floating-Point Compressors. arXiv e-prints, Article arXiv:2011.02849 (Nov. 2020), arXiv:2011.02849 pages. arXiv:2011.02849 [cs.DC]
[11]
Fabian Knorr, Peter Thoman, and Thomas Fahringer. 2021. ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data. In 2021 Data Compression Conference. IEEE. https://dps.uibk.ac.at/~fabian/publications/2021-ndzip-a-high-throughput-parallel-lossless-compressor-for-scientific-data.pdf
[12]
Yinan Li and Jignesh M Patel. 2013. Bitweaving: Fast scans for main memory data processing. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 289--300.
[13]
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE transactions on visualization and computer graphics 20, 12 (2014), 2674--2683.
[14]
Peter Lindstrom and Martin Isenburg. 2006. Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer graphics 12, 5 (2006), 1245--1250.
[15]
Robert Lucas, James Ang, Keren Bergman, Shekhar Borkar, William Carlson, Laura Carrington, George Chiu, Robert Colwell, William Dally, Jack Dongarra, et al. 2014. DOE advanced scientific computing advisory subcommittee (ASCAC) report: Top ten exascale research challenges. Technical Report. USDOE Office of Science (SC)(United States).
[16]
Molly A O'Neil and Martin Burtscher. 2011. Floating-point data compression at 75 Gb/s on a GPU. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. 1--7.
[17]
Adnan Ozsoy and Martin Swany. 2011. CULZSS: LZSS lossless data compression on CUDA. In 2011 IEEE International Conference on Cluster Computing. IEEE, 403--411.
[18]
Ritesh A Patel, Yao Zhang, Jason Mak, Andrew Davidson, and John D Owens. 2012. Parallel lossless data compression on the GPU. IEEE.
[19]
Orestis Polychroniou and Kenneth A Ross. 2015. Efficient lightweight compression alongside fast scans. In Proceedings of the 11th International Workshop on Data Management on New Hardware. 1--6.
[20]
Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing frontiers and innovations 1, 2 (2014), 76--88.
[21]
Peter Thoman, Philip Salzmann, Biagio Cosenza, and Thomas Fahringer. 2019. Celerity: High-Level C++ for Accelerator Clusters. In European Conference on Parallel Processing. Springer, 291--303.
[22]
Jiannan Tian, Cody Rivera, Sheng Di, Jieyang Chen, Xin Liang, Dingwen Tao, and Franck Cappello. 2020. Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures. arXiv preprint arXiv:2010.10039 (2020).
[23]
Oreste Villa, Daniel R Johnson, Mike Oconnor, Evgeny Bolotin, David Nellans, Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero, et al. 2014. Scaling the power wall: a path to exascale. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 830--841.
[24]
André Weißenberger and Bertil Schmidt. 2019. Massively Parallel ANS Decoding on GPUs. In Proceedings of the 48th Int. Conference on Parallel Processing. 1--10.
[25]
Annie Yang, Hari Mukka, Farbod Hesaaraki, and Martin Burtscher. 2015. MPC: a massively parallel compression algorithm for scientific data. In 2015 IEEE International Conference on Cluster Computing. IEEE, 381--389.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. accelerator
  2. data compression
  3. floating-point
  4. gpgpu

Qualifiers

  • Research-article

Funding Sources

  • EuroHPC Joint Undertaking
  • FWF

Conference

SC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)26
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media