skip to main content
research-article

GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

Published: 30 May 2023 Publication History

Abstract

Multinational enterprises conduct global business that has a demand for geo-distributed transactional databases. Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the single-master serving mode incurs massive cross-region writes from clients, and the sharded architecture requires multiple round-trip acknowledgments (e.g., 2PC) to ensure atomicity for cross-shard transactions. These limitations drive us to seek yet another design choice. In this paper, we propose a strongly consistent OLTP database GeoGauss with full replica multi-master architecture. To efficiently merge the updates from different master nodes, we propose a multi-master OCC that unifies data replication and concurrent transaction processing. By leveraging an epoch-based delta state merge rule and the optimistic asynchronous execution, GeoGauss ensures strong consistency with light-coordinated protocol and allows more concurrency with weak isolation, which are sufficient to meet our needs. Our geo-distributed experimental results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower latency than the state-of-the-art geo-distributed database CockroachDB on the TPC-C benchmark.

Supplemental Material

MP4 File
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database Presentation video final version
PPTX File
Presentation video

References

[1]
2022. Apache CouchDB. http://couchdb.apache.org/
[2]
2022. Apache HBase. https://hbase.apache.org/
[3]
2022. ArangoDB. https://www.arangodb.com/
[4]
2022. Aria: A Fast and Practical Deterministic OLTP Database. https://github.com/luyi0619/aria
[5]
2022. Baidu braft. https://github.com/baidu/braft
[6]
2022. CalvinFS. http://https://github.com/kunrenyale/CalvinFS
[7]
2022. Cloudant. https://www.ibm.com/hk-en/cloud/cloudant
[8]
2022. ExtremeDB: Cluster Distributed Database System. https://www.mcobject.com/cluster/
[9]
2022. FaunaDB. https://fauna.com/
[10]
2022. Galera Cluster for MySQL. https://galeracluster.com/
[11]
2022. GNU Gzip. https://www.gnu.org/software/gzip/
[12]
2022. gRPC: A high performance, open source universal RPC framework. https://grpc.io/
[13]
2022. MySQL Tungsten. https://www.continuent.com/products/tungsten-replicator
[14]
2022. MySQL's primary-secondary replication. https://dev.mysql.com/
[15]
2022. openGauss. https://opengauss.org/
[16]
2022. PostgreSQL BDR. https://wiki.postgresql.org/wiki/BDR_Project
[17]
2022. Protocol Buffers. https://developers.google.com/protocol-buffers
[18]
2022. Redis CRDT. https://redis.com/blog/diving-into-crdts/
[19]
2022. Riak: Enterprise NoSQL Database. https://riak.com/
[20]
2022. Semi-synchronous replication at facebook. http://yoshinorimatsunobu.blogspot.com/
[21]
2022. TPC-C Homepage. https://www.tpc.org/tpcc/
[22]
2022. YugabyteDB: Distributed SQL Database. https://www.yugabyte.com/
[23]
2022. ZeroMQ: An open-source universal messaging library. https://zeromq.org/
[24]
Daniel J Abadi and Jose M Faleiro. 2018. An overview of deterministic database systems. Commun. ACM 61, 9 (2018), 78--88.
[25]
Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020. DynaMast: Adaptive dynamic mastering for replicated systems. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1381--1392.
[26]
Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020. MorphoSys: automatic physical design metamorphosis for distributed database systems. Proceedings of the VLDB Endowment 13, 13 (2020), 3573--3587.
[27]
Paulo Sérgio Almeida, Ali Shoker, and Carlos Baquero. 2015. Efficient state-based crdts by delta-mutation. In International Conference on Networked Systems. Springer, 62--76.
[28]
Paulo Sérgio Almeida, Ali Shoker, and Carlos Baquero. 2018. Delta state replicated data types. J. Parallel and Distrib. Comput. 111 (2018), 162--173.
[29]
Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and David Maier. 2017. Blazes: Coordination Analysis and Placement for Distributed Programs. ACM Trans. Database Syst. 42, 4, Article 23 (oct 2017), 31 pages.
[30]
Peter Alvaro, Neil Conway, Joseph M Hellerstein, and William R Marczak. 2011. Consistency Analysis in Bloom: a CALM and Collected Approach. In CIDR. 249--260.
[31]
Mohammad Javad Amiri, Divyakant Agrawal, and Amr El Abbadi. 2019. Caper: a cross-application permissioned blockchain. Proceedings of the VLDB Endowment 12, 11 (2019), 1385--1398.
[32]
Mohammad Javad Amiri, Divyakant Agrawal, and Amr El Abbadi. 2021. Sharper: Sharding permissioned blockchains over network clusters. In Proceedings of the 2021 International Conference on Management of Data. 76--88.
[33]
Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al . 2018. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the thirteenth EuroSys conference. 1--15.
[34]
H. Avni, A. Aliev, O. Amor, A. Avitzur, I. Bronshtein, E. Ginot, S. Goikhman, E. Levy, Lu Levy, I., F., and L. Mishali. 2020. Industrial-Strength OLTP Using Main Memory and Many Cores. Proceedings of the VLDB Endowment 13, 12 (2020), 3099--3111.
[35]
Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2014. Coordination Avoidance in Database Systems. Proc. VLDB Endow. 8, 3 (2014), 185--196.
[36]
Peter David Bailis. 2015. Coordination avoidance in distributed databases. University of California, Berkeley.
[37]
Michael J Cahill, Uwe Röhm, and Alan D Fekete. 2009. Serializable isolation for snapshot databases. ACM Transactions on Database Systems (TODS) 34, 4 (2009), 1--42.
[38]
Prima Chairunnanda, Khuzaima Daudjee, and M Tamer Özsu. 2014. ConfluxDB: Multi-master replication for partitioned snapshot isolation databases. Proceedings of the VLDB Endowment 7, 11 (2014), 947--958.
[39]
Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier. 2012. Logic and Lattices for Distributed Programming. In Proceedings of the Symposium on Cloud Computing (SoCC '12). 1:1--1:14.
[40]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). 143--154.
[41]
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al . 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1--22.
[42]
Hung Dang, Tien Tuan Anh Dinh, Dumitrel Loghin, Ee-Chien Chang, Qian Lin, and Beng Chin Ooi. 2019. Towards scaling blockchain systems via sharding. In Proceedings of the 2019 international conference on management of data. 123--140.
[43]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220.
[44]
Sameh Elnikety, Steven Dropsho, and Fernando Pedone. 2006. Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006 (EuroSys '06) (EuroSys '06). 117--130.
[45]
Jose M Faleiro, Daniel J Abadi, and Joseph M Hellerstein. 2017. High performance transactions via early write visibility. Proceedings of the VLDB Endowment 10, 5 (2017).
[46]
Ant group. 2022. OceanBase. https://open.oceanbase.com/
[47]
Suyash Gupta, Sajjad Rahnama, Jelle Hellings, and Mohammad Sadoghi. 2020. Resilientdb: Global scale resilient blockchain fabric. arXiv preprint arXiv:2002.00160 (2020).
[48]
Rachael Harding, Dana Van Aken, Andrew Pavlo, and Michael Stonebraker. 2017. An evaluation of distributed concurrency control. Proceedings of the VLDB Endowment 10, 5 (2017), 553--564.
[49]
Jelle Hellings and Mohammad Sadoghi. 2021. Byshard: Sharding in a byzantine environment. Proceedings of the VLDB Endowment 14, 11 (2021), 2230--2243.
[50]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40.
[51]
Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. 2020. Aria: a fast and practical deterministic OLTP database. Proceedings of the VLDB Endowment 13, 12 (2020), 2047--2060.
[52]
Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. 2021. Epoch-Based Commit and Replication in Distributed OLTP Databases. Proc. VLDB Endow. 14, 5 (2021), 743--756.
[53]
Yi Lu, Xiangyao Yu, and Samuel Madden. 2019. STAR: Scaling Transactions through Asymmetric Replication. Proc. VLDB Endow. 12, 11 (2019), 1316--1329.
[54]
Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review (2008), 21260.
[55]
Pincap. 2022. TiDB. https://pingcap.com/products/tidb
[56]
Nuno Preguiça. 2018. Conflict-free replicated data types: An overview. arXiv preprint arXiv:1806.10254 (2018).
[57]
Thamir Qadah, Suyash Gupta, and Mohammad Sadoghi. 2020. Q-Store: Distributed, Multi-partition Transactions via Queue-oriented Execution and Communication. In Proceedings of the 23rd International Conference on Extending Database Technology (EDBT). 73--84.
[58]
Thamir M Qadah and Mohammad Sadoghi. 2018. Quecc: A queue-oriented, control-free concurrency architecture. In Proceedings of the 19th International Middleware Conference. 13--25.
[59]
Ian Rae, Eric Rollins, Jeff Shute, Sukhdeep Sodhi, and Radek Vingralek. 2013. Online, Asynchronous Schema Change in F1. Proc. VLDB Endow. 6, 11 (aug 2013), 1045--1056.
[60]
Sajjad Rahnama, Suyash Gupta, Rohan Sogani, Dhruv Krishnan, and Mohammad Sadoghi. 2021. RingBFT: Resilient Consensus over Sharded Ring Topology. arXiv preprint arXiv:2107.13047 (2021).
[61]
Kun Ren, Dennis Li, and Daniel J. Abadi. 2019. SLOG: Serializable, Low-Latency, Geo-Replicated Transactions. Proc. VLDB Endow. 12, 11 (jul 2019), 1747--1761.
[62]
Kun Ren, Alexander Thomson, and Daniel J Abadi. 2014. An evaluation of the advantages and disadvantages of deterministic database systems. Proceedings of the VLDB Endowment 7, 10 (2014), 821--832.
[63]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. A comprehensive study of convergent and commutative replicated data types. Ph. D. Dissertation. Inria--Centre Paris-Rocquencourt; INRIA.
[64]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-free replicated data types. In Symposium on Self-Stabilizing Systems. 386--400.
[65]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-free Replicated Data Types. In Proceedings of the Symposium on Self-stabilizing Systems (SSS '11). 386--400.
[66]
Chrysoula Stathakopoulou, Matej Pavlovic, and Marko Vukolic. 2022. State machine replication scalability made simple. In Proceedings of the Seventeenth European Conference on Computer Systems. 17--33.
[67]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et al. 2020. Cockroachdb: The resilient geo-distributed sql database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1493--1509.
[68]
Alexander Thomson and Daniel J Abadi. 2010. The case for determinism in database systems. Proceedings of the VLDB Endowment 3, 1--2 (2010), 70--80.
[69]
Alexander Thomson and Daniel J Abadi. 2015. CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15). 1--14.
[70]
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J Abadi. 2012. Calvin: fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 1--12.
[71]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1041--1052.
[72]
Chenggang Wu, Jose M Faleiro, Yihan Lin, and Joseph M Hellerstein. 2019. Anna: A kvs for any scale. IEEE Transactions on Knowledge and Data Engineering 33, 2 (2019), 344--358.
[73]
Chenggang Wu, Vikram Sreekanti, and Joseph M Hellerstein. 2019. Autoscaling tiered cloud storage in Anna. Proceedings of the VLDB Endowment 12, 6 (2019), 624--638.
[74]
Chang Yao, Divyakant Agrawal, Gang Chen, Qian Lin, Beng Chin Ooi, Weng-Fai Wong, and Meihui Zhang. 2016. Exploiting single-threaded model in multi-core in-memory systems. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2635--2650.

Cited By

View all

Index Terms

  1. GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 1
    PACMMOD
    May 2023
    2807 pages
    EISSN:2836-6573
    DOI:10.1145/3603164
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2023
    Published in PACMMOD Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. deterministic databases
    2. geo-distributed
    3. multi-master replication
    4. replica consistency
    5. transaction processing

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)143
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media