skip to main content
10.1145/3335783.3335792acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Versioning in Main-Memory Database Systems: From MusaeusDB to TardisDB

Published: 23 July 2019 Publication History

Abstract

As relational database systems do not support collaborative dataset editing, online lexicons---such as Wikipedia's Media Wiki---build their own version control above the database system to allow constraint-preserving version checkouts or commits involving multiple tables. To eliminate the need for purpose-specific solutions, we propose adding version control as a layer on top of the database system or integrating versioning in the database system's core.
This paper presents the first two architectures for versioning an entire state of a database system with respect to references among multiple relations. We design the prototype MusaeusDB as a solution for existing database systems, either as an external tool or as an extended SQL interface. The prototype TardisDB---an extended main-memory database system---reuses multi-version concurrency control for in-place updates while keeping older versions accessible. For performance tests on different storage layouts, we create---based on Wikipedia's page history---the TardisBenchmark. Our results show that it is indeed feasible to reduce wasted space while still ensuring constant retrieval time. Also, extending a main-memory database system's multi-version concurrency control has no negative impact on the transactional throughput. For further research on database versioning, we offer a flexibly sized benchmark with time evolving, text-based datasets and compression techniques.

References

[1]
Alexa Internet. 2017. wikipedia.org Traffic Statistics. http://www.alexa.com/siteinfo/wikipedia.org. {Online; February 23, 2019}.
[2]
Rudolf Bayer and J. K. Metzger. 1975. On the Encipherment of Search Trees and Random Access Files. In Proceedings of the International Conference on Very Large Data Bases, September 22-24, 1975, Framingham, Massachusetts, USA. 452.
[3]
Anant P. Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, and Aditya G. Parameswaran. 2015. DataHub: Collaborative Data Science & Dataset Version Management at Scale. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper18.pdf
[4]
Anant P. Bhardwaj, Amol Deshpande, Aaron J. Elmore, David R. Karger, Sam Madden, Aditya G. Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang. 2015. Collaborative Data Analytics with DataHub. PVLDB 8, 12 (2015), 1916--1919. http://www.vldb.org/pvldb/vol8/p1916-bhardwaj.pdf
[5]
Souvik Bhattacherjee, Amit Chavan, Silu Huang, Amol Deshpande, and Aditya G. Parameswaran. 2015. Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff. PVLDB 8, 12 (2015), 1346--1357. http://www.vldb.org/pvldb/vol8/p1346-bhattacherjee.pdf
[6]
Souvik Bhattacherjee and Amol Deshpande. 2018. RStore: A Distributed Multi-Version Document Store. In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018. 389--400.
[7]
Peter Buneman, Sanjeev Khanna, Keishi Tajima, and Wang Chiew Tan. 2004. Archiving scientific data. ACM Trans. Database Syst. 29 (2004), 2--42.
[8]
Amit Chavan, Silu Huang, Amol Deshpande, Aaron J. Elmore, Samuel Madden, and Aditya G. Parameswaran. 2015. Towards a Unified Query Language for Provenance and Versioning. In 7th USENIX Workshop on the Theory and Practice of Provenance, TaPP 2015, Edinburgh, Scotland, UK, July 8-9, 2015. https://www.usenix.org/conference/tapp15/workshop-program/presentation/chavan
[9]
Markus Dreseler, Jan Kossmann, Martin Boissier, Stefan Klauck, Matthias Uflacker, and Hasso Plattner. 2019. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. In Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26-29, 2019. 313--324.
[10]
Silu Huang, Liqi Xu, Jialin Liu, Aaron J. Elmore, and Aditya G. Parameswaran. 2017. OrpheusDB: Bolt-on Versioning for Relational Databases. PVLDB 10, 10 (2017), 1130--1141. http://www.vldb.org/pvldb/vol10/p1130-huang.pdf
[11]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based On Virtual Memory Snapshots. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 195--206.
[12]
Krishna G. Kulkarni and Jan-Eike Michels. 2012. Temporal features in SQL: 2011. SIGMOD Record 41, 3 (2012), 34--43.
[13]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013. 38--49.
[14]
David B. Lomet, Roger S. Barga, Mohamed F. Mokbel, German Shegalov, Rui Wang, and Yunyue Zhu. 2006. Transaction Time Support Inside a Database Engine. In Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3-8 April 2006, Atlanta, GA, USA. 35.
[15]
Michael Maddox, David Goehring, Aaron J. Elmore, Samuel Madden, Aditya G. Parameswaran, and Amol Deshpande. 2016. Decibel: The Relational Dataset Branching System. PVLDB 9, 9 (2016), 624--635. http://www.vldb.org/pvldb/vol9/p624-maddox.pdf
[16]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB 4, 9 (2011), 539--550.
[17]
Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31-June 4, 2015. 677--689.
[18]
Betty Salzberg and Vassilis J. Tsotras. 1999. Comparison of Access Methods for Time-Evolving Data. ACM Comput. Surv. 31, 2 (1999), 158--221.
[19]
Adam Seering, Philippe Cudré-Mauroux, Samuel Madden, and Michael Stonebraker. 2012. Efficient Versioning for Scientific Array Databases. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012. 1013--1024.
[20]
Richard T. Snodgrass. 1987. The Temporal Query Language TQuel. ACM Trans. Database Syst. 12, 2 (1987), 247--298.
[21]
Richard T. Snodgrass and Henry Kucera. 1995. Rationale for a Temporal Extension to SQL. In The TSQL2 Temporal Query Language. 3--18.
[22]
Emad Soroush and Magdalena Balazinska. 2013. Time travel in a scientific array database. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013. 98--109.
[23]
Jonas Tappolet and Abraham Bernstein. 2009. Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL. In The Semantic Web: Research and Applications, 6th European Semantic Web Conference, ESWC 2009, Heraklion, Crete, Greece, May 31-June 4, 2009, Proceedings. 308--322.
[24]
Liqi Xu, Silu Huang, SiLi Hui, Aaron J. Elmore, and Aditya G. Parameswaran. 2017. OrpheusDB: A Lightweight Approach to Relational Dataset Versioning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017. 1655--1658.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '19: Proceedings of the 31st International Conference on Scientific and Statistical Database Management
July 2019
244 pages
ISBN:9781450362160
DOI:10.1145/3335783
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SQL
  2. Version control

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SSDBM '19

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media