skip to main content
10.1145/3448016.3457563acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Bringing Cloud-Native Storage to SAP IQ

Published: 18 June 2021 Publication History

Abstract

In this paper, we describe our journey of transforming SAP IQ into a relational database management system (RDBMS) that utilizes cheap, elastically scalable object stores on the cloud. SAP IQ is a three-decade old, disk-based, columnar RDBMS that is optimized for complex online analytical processing (OLAP) workloads. Traditionally, SAP IQ has been designed to operate on shared storage devices with strong consistency guarantees (e.g., high-caliber storage area network devices). Therefore, deploying SAP IQ on the cloud, as is, would have meant utilizing storage solutions such as NetApp or AWS EFS that provide a POSIX compliant file interface and strong consistency guarantees, but at a much higher monetary cost. These costs can accumulate easily to diminish the economies of scale that one would expect on the cloud, which can be undesirable. Instead, we have enhanced the design of SAP IQ to operate on cloud object stores such as AWS S3 and Azure Blob Storage. Object stores rely on a weaker consistency model, and potentially have higher latency; however, because of these design trade-offs, they are able to offer (i) better pricing, (ii) enhanced durability, (iii) improved elasticity, and (iv) higher throughput. By enhancing SAP IQ to operate under these design trade-offs, we have unlocked many of the opportunities offered by object stores. More specifically, we have extended SAP IQ's buffer manager and transaction manager, and have introduced a new caching layer that utilizes instance storage on AWS EC2. Experiments using the TPC-H benchmark demonstrate that we can gain an order of magnitude reduction in data-at rest storage costs while improving query and load performance.

Supplementary Material

MP4 File (3448016.3457563.mp4)
In this paper, we describe our journey of transforming SAP IQ into a relational database management system (RDBMS) that utilizes cheap, elastically scalable object stores on the cloud. SAP IQ is a three-decade old, disk-based, columnar RDBMS that is optimized for complex online analytical processing (OLAP) workloads. Traditionally, SAP IQ has been designed to operate on shared storage devices with strong consistency guarantees (e.g., high-caliber storage area network devices). Therefore, deploying SAP IQ on the cloud, as is, would have meant utilizing storage solutions such as NetApp or AWS EFS that provide a POSIX compliant file interface and strong consistency guarantees, but at a much higher monetary cost. These costs can accumulate easily to diminish the economies of scale that one would expect on the cloud, which can be undesirable. Instead, we have enhanced the design of SAP IQ to operate on cloud object stores such as AWS S3 and Azure Blob Storage. Object stores rely on a weaker consistency model, and potentially have higher latency; however, because of these design trade-offs, they are able to offer (i) better pricing, (ii) enhanced durability, (iii) improved elasticity, and (iv) higher throughput. By enhancing SAP IQ to operate under these design trade-offs, we have unlocked many of the opportunities offered by object stores. More specifically, we have extended SAP IQ's buffer manager and transaction manager, and have introduced a new caching layer that utilizes instance storage on AWS EC2. Experiments using the TPC-H benchmark demonstrate that we can gain an order of magnitude reduction in data at-rest storage costs while improving query and load performance.

References

[1]
Alibaba Object Storage Service. https://www.alibabacloud.com/product/oss/.
[2]
Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs/.
[3]
Amazon Elastic Compute Cloud (EC2). https://aws.amazon.com/ec2/.
[4]
Amazon Elastic File System (EFS). https://aws.amazon.com/efs/.
[5]
Amazon Simple Storage Service (S3). http://aws.amazon.com/s3/.
[6]
Amazon Simple Workflow Service (SWF). https://aws.amazon.com/swf/.
[7]
Apache Hudi. https://hudi.apache.org/.
[8]
Apache Parquet. https://parquet.apache.org/.
[9]
Apache Software Foundation, Hadoop. https://hadoop.apache.org/.
[10]
Apache Spark -- Lightning-fast unified analytics engine. https://spark.apache.org/.
[11]
Azure Blob Storage. https://azure.microsoft.com/en-us/services/storage/blobs/.
[12]
Best practices design patterns: Optimizing Amazon S3 performance. https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html.
[13]
Dremio. https://www.dremio.com/product/.
[14]
Google BigQuery. https://cloud.google.com/bigquery/.
[15]
Google Cloud Storage. https://cloud.google.com/storage/.
[16]
NetApp -- Data Management Solutions for the Cloud. https://www.netapp.com/.
[17]
SAP HANA Cloud. https://www.sap.com/products/hana/cloud.html.
[18]
SAP IQ. https://www.sap.com/canada/products/sybase-iq-big-data-management.html.
[19]
SAP IQ Performance and Tuning Guide -- Zone Maps. https://help.sap.com/viewer/a8982cc084f21015a7b4b7fcdeb0953d/16.1.3.0/en-US/6604c2567d66453da391dee00dcf5d5c.html.
[20]
TPC Benchmark H (decision support) standard specification. http://www.tpc.org/tpch/.
[21]
A. Agarwal, S. A. Kirk, B. French, N. Marathe, S. Mungikar, and K. Mittal. Tiered index management, 2013. US Patent 10061792B2.
[22]
A. Ailamaki, D. J. DeWitt, and M. D. Hill. Data page layouts for relational databases on deep memory hierarchies. VLDB J., 11(3):198--215, 2002.
[23]
P. Antonopoulos, A. Budovski, C. Diaconu, A. H. Saenz, J. Hu, H. Kodavalla, D. Kossmann, S. Lingam, U. F. Minhas, N. Prakash, V. Purohit, H. Qu, C. S. Ravella, K. Reisteter, S. Shrotri, D. Tang, and V. Wakade. Socrates: The new SQL server in the cloud. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD, pages 1743--1756, 2019.
[24]
M. Armbrust, T. Das, S. Paranjpye, R. Xin, S. Zhu, A. Ghodsi, B. Yavuz, M. Murthy, J. Torres, L. Sun, P. A. Boncz, M. Mokhtar, H. V. Hovell, A. Ionescu, A. Luszczak, M. Switakowski, T. Ueshin, X. Li, M. Szafranski, P. Senster, and M. Zaharia. Delta lake: High-performance ACID table storage over cloud object stores. Proc. VLDB Endow., 13(12):3411--3424, 2020.
[25]
H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O'Neil, and P. E. O'Neil. A critique of ANSI SQL isolation levels. In Proceedings of the 1995 ACM International Conference on Management of Data, SIGMOD, pages 1--10, 1995.
[26]
J. Camacho-Rodr'i guez, A. Chauhan, A. Gates, E. Koifman, O. O'Malley, V. Garg, Z. Haindrich, S. Shelukhin, P. Jayachandran, S. Seth, D. Jaiswal, S. Bouguerra, N. Bangarwa, S. Hariappan, A. Agarwal, J. Dere, D. Dai, T. Nair, N. Dembla, G. Vijayaraghavan, and G. Hagleitner. Apache Hive: From MapReduce to enterprise-grade big data warehousing. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD, pages 1773--1786, 2019.
[27]
C. Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. In Proceedings of the 1998 ACM International Conference on Management of Data, SIGMOD, pages 355--366, 1998.
[28]
D. Comer. The ubiquitous b-tree. ACM Comput. Surv., 11(2):121--137, 1979.
[29]
B. F. Cooper, P. P. S. Narayan, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS to sherpa: Lessons from yahoo!'s cloud database. Proc. VLDB Endow., 12(12):2300--2307, 2019.
[30]
B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The snowflake elastic data warehouse. In Proceedings of the 2016 ACM International Conference on Management of Data, SIGMOD, pages 215--226. ACM, 2016.
[31]
S. Das, M. Grbic, I. Ilic, I. Jovandic, A. Jovanovic, V. R. Narasayya, M. Radulovic, M. Stikic, G. Xu, and S. Chaudhuri. Automatically indexing millions of databases in microsoft azure SQL database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, pages 666--679, 2019.
[32]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation, OSDI, pages 137--150, 2004.
[33]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 2007 ACM Symposium on Operating Systems Principles, SOSP, pages 205--220, 2007.
[34]
F. F"a rber, S. K. Cha, J. Primsch, C. Bornhö vd, S. Sigg, and W. Lehner. SAP HANA database: data management for modern business applications. SIGMOD Rec., 40(4):45--51, 2011.
[35]
A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon Redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM International Conference on Management of Data, SIGMOD, pages 1917--1923, 2015.
[36]
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.
[37]
C. Jia and H. Li. Virtual distributed file system: Alluxio. In Encyclopedia of Big Data Technologies. Springer, 2019.
[38]
D. E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, 1973.
[39]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. Proc. VLDB Endow., 5(12):1790--1801, 2012.
[40]
F. Li. Cloud native database systems at alibaba: Opportunities and challenges. Proc. VLDB Endow., 12(12):2263--2272, 2019.
[41]
M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3--30, 1998.
[42]
S. Mungikar and B. French. Smart pre-fetch for sequential access on BTree, 2013. US Patent 9552298B2.
[43]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4(9):539--550, 2011.
[44]
R. Ramakrishnan, B. Sridharan, J. R. Douceur, P. Kasturi, B. Krishnamachari-Sampath, K. Krishnamoorthy, P. Li, M. Manu, S. Michaylov, R. Ramos, N. Sharman, Z. Xu, Y. Barakat, C. Douglas, R. Draves, S. S. Naidu, S. Shastry, A. Sikaria, S. Sun, and R. Venkatesan. Azure data lake store: A hyperscale distributed file service for big data analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pages 51--63, 2017.
[45]
G. M. Sacco. Buffer management. In L. Liu and M. T. Ö zsu, editors, Encyclopedia of Database Systems, pages 277--282. Springer US, 2009.
[46]
R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yegitbasi, H. Jin, E. Hwang, N. Shingte, and C. Berner. Presto: SQL on everything. In Proceedings of the 2019 IEEE International Conference on Data Engineering, ICDE, pages 1802--1813, 2019.
[47]
M. Sharique, A. K. Goel, and M. Andrei. Rollover strategies in a n-bit dictionary compressed column store, 2013. US Patent 9489409B2.
[48]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE Conference on Mass Storage Systems and Technologies, MSST, pages 1--10, 2010.
[49]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In Proceedings of 2005 International Conference on Very Large Data Bases, VLDB, pages 553--564, 2005.
[50]
M. Stonebraker, D. J. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010.
[51]
J. Tan, T. Ghanem, M. Perron, X. Yu, M. Stonebraker, D. J. DeWitt, M. Serafini, A. Aboulnaga, and T. Kraska. Choosing A cloud DBMS: architectures and tradeoffs. Proc. VLDB Endow., 12(12):2170--2182, 2019.
[52]
B. Vandiver, S. Prasad, P. Rana, E. Zik, A. Saeidi, P. Parimal, S. Pantela, and J. Dave. Eon mode: Bringing the vertica columnar database to the cloud. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 ACM International Conference on Management of Data, SIGMOD, pages 797--809, 2018.
[53]
A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pages 1041--1052, 2017.
[54]
M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes. Building an elastic query engine on disaggregated storage. In Proceedings of the 2020 Symposium on Networked Systems Design and Implementation, USENIX, pages 449--462, 2020.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. caching
  2. cloud-native storage
  3. garbage collection
  4. snapshots

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)5
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media