skip to main content
10.1145/3538712.3538717acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Region-adaptive, Error-controlled Scientific Data Compression using Multilevel Decomposition

Published: 23 August 2022 Publication History

Abstract

The increase of computer processing speed is significantly outpacing improvements in network and storage bandwidth, leading to the big data challenge in modern science, where scientific applications can quickly generate much more data than that can be transferred and stored. As a result, big scientific data must be reduced by a few orders of magnitude while the accuracy of the reduced data needs to be guaranteed for further scientific explorations. Moreover, scientists are often interested in some specific spatial/temporal regions in their data, where higher accuracy is required. The locations of the regions requiring high accuracy can sometimes be prescribed based on application knowledge, while other times they must be estimated based on general spatial/temporal variation. In this paper, we develop a novel multilevel approach which allows users to impose region-wise compression error bounds. Our method utilizes the byproduct of a multilevel compressor to detect regions where details are rich and we provide the theoretical underpinning for region-wise error control. With spatially varying precision preservation, our approach can achieve significantly higher compression ratios than single-error bounded compression approaches and control errors in the regions of interest.
We conduct the evaluations on two climate use cases – one targeting small-scale, node features and the other focusing on long, areal features. For both use cases, the locations of the features were unknown ahead of the compression. By selecting approximately 16% of the data based on multi-scale spatial variations and compressing those regions with smaller error tolerances than the rest, our approach improves the accuracy of post-analysis by approximately 2 × compared to single-error-bounded compression at the same compression ratio. Using the same error bound for the region of interest, our approach can achieve an increase of more than 50% in overall compression ratio.

References

[1]
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 94–105.
[2]
Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky. 2018. Multilevel techniques for compression and reduction of scientific data—the univariate case. Computing and Visualization in Science 19, 5 (2018), 65–76.
[3]
Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky. 2019. Multilevel techniques for compression and reduction of scientific data—quantitative control of accuracy in derived quantities. SIAM Journal on Scientific Computing 41, 4 (2019), A2146–A2171.
[4]
Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky. 2019. Multilevel techniques for compression and reduction of scientific data—The multivariate case. SIAM Journal on Scientific Computing 41, 2 (2019), A1278–A1303.
[5]
Mark Ainsworth, Ozan Tugluk, Ben Whitney, and Scott Klasky. 2020. Multilevel Techniques for Compression and Reduction of Scientific Data—The Unstructured Case. SIAM Journal on Scientific Computing 42, 2 (apr 2020), A1402–A1427. https://doi.org/10.1137/19M1267878
[6]
Marsha Berger and Isidore Rigoutsos. 1991. An algorithm for point clustering and grid generation. IEEE Transactions on Systems, Man, and Cybernetics 21, 5(1991), 1278–1286.
[7]
Marsha J Berger and Joseph Oliger. 1984. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of computational Physics 53, 3 (1984), 484–512.
[8]
Harsh Bhatia, Duong Hoang, Garrett Morrison, Will Usher, Valerio Pascucci, Peer-Timo Bremer, and Peter Lindstrom. 2020. AMM: Adaptive Multilinear Meshes. arXiv preprint arXiv:2007.15219(2020).
[9]
Martin Burtscher and Paruj Ratanaworabhan. 2008. FPC: A high-speed compressor for double-precision floating-point data. IEEE Trans. Comput. 58, 1 (2008), 18–31.
[10]
Chunlei Cai, Li Chen, Xiaoyun Zhang, and Zhiyong Gao. 2019. End-to-end optimized ROI image compression. IEEE Transactions on Image Processing 29 (2019), 3442–3457.
[11]
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6(2019), 1201–1220.
[12]
Jianyu Chen, Maurice Daverveldt, and Zaid Al-Ars. 2021. FPGA Acceleration of Zstd Compression Algorithm. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 188–191.
[13]
Jieyang Chen, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd Munson, 2021. Accelerating multigrid-based hierarchical scientific data refactoring on gpus. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 859–868.
[14]
Steven Claggett, Sahar Azimi, and Martin Burtscher. 2018. SPDP: An automatically synthesized lossless compression algorithm for floating-point data. In 2018 Data Compression Conference. IEEE, 335–344.
[15]
Peter Deutsch 1996. GZIP file format specification version 4.3. (1996).
[16]
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 730–739.
[17]
Miloslav Feistauer, Jaromír Horáček, and Petr Sváček. 2015. Numerical simulation of airfoil vibrations induced by turbulent flow. Communications in Computational Physics 17, 1 (2015), 146–188.
[18]
D Galassi, C Theiler, T Body, F Manke, P Micheletti, J Omotani, M Wiesenberger, M Baquero-Ruiz, I Furno, M Giacomin, 2022. Validation of edge turbulence codes in a magnetic X-point scenario in TORPEX. Physics of Plasmas 29, 1 (2022), 012501.
[19]
Jean-Christophe Golaz, Peter M Caldwell, Luke P Van Roekel, Mark R Petersen, Qi Tang, Jonathan D Wolfe, Guta Abeshu, Valentine Anantharaj, Xylar S Asay-Davis, David C Bader, 2019. The DOE E3SM coupled model version 1: Overview and evaluation at standard resolution. Journal of Advances in Modeling Earth Systems 11, 7 (2019), 2089–2129.
[20]
Qian Gong, Xin Liang, Ben Whitney, Jong Youl Choi, Jieyang Chen, Lipeng Wan, Stéphane Ethier, Seung-Hoe Ku, R Michael Churchill, C-S Chang, 2021. Maintaining Trust in Reduction: Preserving the Accuracy of Quantities of Interest for Lossy Compression. In Smoky Mountains Computational Sciences and Engineering Conference. Springer, 22–39.
[21]
Qian Gong, Xin Liang, Ben Whitney, and Scott Klasky. 2022. Improved L∞ Error Control with MGARD. in preparation (2022).
[22]
Hanqi Guo, David Lenz, Jiayi Xu, Xin Liang, Wenbin He, Iulian R Grindeanu, Han-Wei Shen, Tom Peterka, Todd Munson, and Ian Foster. 2021. FTK: A simplicial spacetime meshing framework for robust and scalable feature tracking. IEEE Transactions on Visualization and Computer Graphics 27, 8(2021), 3463–3480.
[23]
David A Huffman. 1952. A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40, 9 (1952), 1098–1101.
[24]
S Ku, Robert Hager, Choong-Seock Chang, JM Kwon, and Scott E Parker. 2016. A new hybrid-Lagrangian numerical scheme for gyrokinetic simulation of tokamak edge plasma. J. Comput. Phys. 315(2016), 467–475.
[25]
Sriram Lakshminarasimhan, Neil Shah, Stephane Ethier, Seung-Hoe Ku, Choong-Seock Chang, Scott Klasky, Rob Latham, Rob Ross, and Nagiza F Samatova. 2013. ISABELA for effective in situ compression of scientific data. Concurrency and Computation: Practice and Experience 25, 4(2013), 524–540.
[26]
Xin Liang, Qian Gong, Jieyang Chen, Ben Whitney, Lipeng Wan, Qing Liu, David Pugmire, Rick Archibald, Norbert Podhorszki, and Scott Klasky. 2021. Error-controlled, progressive, and adaptable retrieval of scientific data with multilevel decomposition. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–13.
[27]
Xin Liang, Hanqi Guo, Sheng Di, Franck Cappello, Mukund Raj, Chunhui Liu, Kenji Ono, Zizhong Chen, and Tom Peterka. 2020. Toward Feature-Preserving 2D and 3D Vector Field Compression. In PacificVis. 81–90.
[28]
Xin Liang, Ben Whitney, Jieyang Chen, Lipeng Wan, Qing Liu, Dingwen Tao, James Kress, David R Pugmire, Matthew Wolf, Norbert Podhorszki, 2021. MGARD+: Optimizing multilevel methods for error-bounded scientific data reduction. IEEE Trans. Comput. (2021).
[29]
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE transactions on visualization and computer graphics 20, 12(2014), 2674–2683.
[30]
Peter Lindstrom and Martin Isenburg. 2006. Fast and efficient compression of floating-point data. IEEE transactions on visualization and computer graphics 12, 5(2006), 1245–1250.
[31]
OLCF. Accessed: March 14, 2022. Andes User Guide. https://www.olcf.ornl.gov/olcf-resources/compute-systems/andes(Accessed: March 14, 2022).
[32]
Erik Schnetter. 2013. Performance and optimization abstractions for large scale heterogeneous systems in the cactus/chemora framework. In 2013 Extreme Scaling Workshop (xsw 2013). IEEE, 33–42.
[33]
Maxime Soler, Mélanie Plainchault, Bruno Conche, and Julien Tierny. 2018. Topologically controlled lossy compression. In 2018 IEEE Pacific Visualization Symposium (PacificVis). IEEE, 46–55.
[34]
Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing frontiers and innovations 1, 2 (2014), 76–88.
[35]
Myungseo Song, Jinyoung Choi, and Bohyung Han. 2021. Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2380–2389.
[36]
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1129–1139.
[37]
Paul A Ullrich and Colin M Zarzycki. 2017. TempestExtremes: A framework for scale-insensitive pointwise feature tracking on unstructured grids. Geoscientific Model Development 10, 3 (2017), 1069–1090.
[38]
Paul A Ullrich, Colin M Zarzycki, Elizabeth E McClenny, Marielle C Pinheiro, Alyssa M Stansfield, and Kevin A Reed. 2021. TempestExtremes v2. 1: a community framework for feature detection, tracking, and analysis in large datasets. Geoscientific Model Development 14, 8 (2021), 5023–5048.
[39]
Skylar W Wurster, Han-Wei Shen, Hanqi Guo, Thomas Peterka, Mukund Raj, and Jiayi Xu. 2021. Deep Hierarchical Super-Resolution for Scientific Data Reduction and Visualization. arXiv preprint arXiv:2107.00462(2021).
[40]
Mai Xu, Xin Deng, Shengxi Li, and Zulin Wang. 2014. Region-of-interest based conversational HEVC coding with hierarchical perception model of face. IEEE Journal of Selected Topics in Signal Processing 8, 3 (2014), 475–489.
[41]
Colin M Zarzycki and Paul A Ullrich. 2017. Assessing sensitivities in algorithmic detection of tropical cyclones in climate data. Geophysical Research Letters 44, 2 (2017), 1141–1149.
[42]
Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1643–1654.

Cited By

View all

Index Terms

  1. Region-adaptive, Error-controlled Scientific Data Compression using Multilevel Decomposition

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SSDBM '22: Proceedings of the 34th International Conference on Scientific and Statistical Database Management
      July 2022
      201 pages
      ISBN:9781450396677
      DOI:10.1145/3538712
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 August 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Climate Data Compression
      2. Error Control
      3. Region-adaptive Lossy Compression

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • Department of Energy ASCR

      Conference

      SSDBM 2022

      Acceptance Rates

      Overall Acceptance Rate 56 of 146 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)33
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 01 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media