research-article

H2O: a hands-free adaptive store

Authors:

Ioannis Alagiannis,

Stratos Idreos,

Anastasia AilamakiAuthors Info & Claims

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 1103 - 1114

https://doi.org/10.1145/2588555.2610502

Published: 18 June 2014 Publication History

Abstract

Modern state-of-the-art database systems are designed around a single data storage layout. This is a fixed decision that drives the whole architectural design of a database system, i.e., row-stores, column-stores. However, none of those choices is a universally good solution; different workloads require different storage layouts and data access methods in order to achieve good performance.

In this paper, we present the H2O system which introduces two novel concepts. First, it is flexible to support multiple storage layouts and data access patterns in a single engine. Second, and most importantly, it decides on-the-fly, i.e., during query processing, which design is best for classes of queries and the respective data parts. At any given point in time, parts of the data might be materialized in various patterns purely depending on the query workload; as the workload changes and with every single query, the storage and access patterns continuously adapt. In this way, H2O makes no a priori and fixed decisions on how data should be stored, allowing each single query to enjoy a storage and access pattern which is tailored to its specific properties.

We present a detailed analysis of H2O using both synthetic benchmarks and realistic scientific workloads. We demonstrate that while existing systems cannot achieve maximum performance across all workloads, H2O can always match the best case performance without requiring any tuning or workload knowledge.

References

[1]

D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, and S. Madden. The design and implementation of modern column-oriented database systems. Foundations and Trends in Databases, 5(3):197--280, 2013.

Digital Library

[2]

D. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD, 2008.

Digital Library

[3]

S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004.

Digital Library

[4]

A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB, 2001.

Digital Library

[5]

A. Ailamaki, D. DeWitt, M. Hill, and D. Wood. DBMSs on a modern processor: Where does time go? In VLDB, 1999.

Digital Library

[6]

I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki. NoDB: efficient query execution on raw data files. In SIGMOD, 2012.

Digital Library

[7]

P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.

[8]

N. Bruno and S. Chaudhuri. Automatic physical database tuning: A relaxation-based approach. In SIGMOD, 2005.

Digital Library

[9]

D. Chamberlin et al. A history and evaluation of System R. Commun. ACM, 24(10):632--646, 1981.

Digital Library

[10]

P. Cudré-Mauroux, E. Wu, and S. Madden. The case for RodentStore: An adaptive, declarative storage system. In CIDR, 2009.

[11]

J. Dittrich and A. Jindal. Towards a one size fits all database architecture. In CIDR, 2011.

[12]

F. Farber et al. SAP HANA database: data management for modern business applications. SIGMOD Record, 40(4):45--51, 2011.

Digital Library

[13]

G. Graefe, F. Halim, S. Idreos, H. A. Kuno, and S. Manegold. Concurrency control for adaptive indexing. PVLDB, 5(7):656--667, 2012.

Digital Library

[14]

G. Graefe, F. Halim, S. Idreos, H. A. Kuno, S. Manegold, and B. Seeger. Transactional support for adaptive indexing. VLDB J., 23(2):303--328, 2014.

[15]

G. Graefe and H. Kuno. Self-selecting, self-tuning, incrementally optimized indexes. In EDBT, 2010.

Digital Library

[16]

M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudré-Mauroux, and S. Madden. HYRISE - a main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010.

Digital Library

[17]

F. Halim, S. Idreos, P. Karras, and R. Yap. Stochastic database cracking: Towards robust adaptive indexing in main-memory column-stores. PVLDB, 5(6):502--513, 2012.

Digital Library

[18]

R. Hankins and J. Patel. Data morphing: An adaptive, cache-conscious storage technique. In VLDB, 2003.

Digital Library

[19]

S. Harizopoulos, V. Liang, D. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, 2006.

Digital Library

[20]

J. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a database system. Foundations and Trends in Databases, 1(2):141--259, 2007.

Digital Library

[21]

M. Hirzel et al. IBM streams processing language: Analyzing big data in motion. IBM Journal of Research and Development, 57(3/4):7, 2013.

Digital Library

[22]

S. Idreos, I. Alagiannis, R. Johnson, and A. Ailamaki. Here are my data files. Here are my queries. Where are my results? In CIDR, 2011.

[23]

S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.

[24]

S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In SIGMOD, 2007.

Digital Library

[25]

S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column-stores. In SIGMOD, 2009.

Digital Library

[26]

S. Idreos and E. Liarou. dbTouch: Analytics at your fingertips. In CIDR, 2013.

[27]

S. Idreos, S. Manegold, H. Kuno, and G. Graefe. Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores. PVLDB, 4(9), 2011.

Digital Library

[28]

A. Jindal and J. Dittrich. Relax and let the database do the partitioning online. In BIRTE, 2011.

[29]

A. Kemper and T. Neumann. Hyper: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, 2011.

Digital Library

[30]

M. L. Kersten, S. Idreos, S. Manegold, and E. Liarou. The researcher's guide to the data deluge: Querying a scientific database in just a few seconds. PVLDB, 4(12):1474--1477, 2011.

Digital Library

[31]

K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, 2010.

[32]

A. Lamb et al. The Vertica analytic database: C-Store 7 years later. PVLDB, 5(12):1790--1801, 2012.

Digital Library

[33]

P.-Å. Larson et al. Enhancements to SQL Server column stores. In SIGMOD, 2013.

Digital Library

[34]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, 2004.

Digital Library

[35]

R. MacNicol and B. French. Sybase IQ Multiplex - designed for analytics. In VLDB, 2004.

Digital Library

[36]

S. Manegold, P. Boncz, and M. Kersten. Optimizing database architecture for the new bottleneck: memory access. VLDB J., 9(3):231--246, 2000.

Digital Library

[37]

A. Nandi and H. V.Jagadish. Guided interaction: Rethinking the query-result paradigm. In VLDB, 2011.

[38]

S. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical partitioning algorithms for database design. ACM Trans. Database Syst., 9(4):680--710, 1984.

Digital Library

[39]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.

Digital Library

[40]

S. Padmanabhan, T. Malkemus, R. Agarwal, and A. Jhingran. Block oriented processing of relational database operations in modern computer architectures. In ICDE, 2001.

Digital Library

[41]

S. Papadomanolakis and A. Ailamaki. AutoPart: Automating schema design for large scientific databases using data partitioning. In SSDBM, 2004.

Digital Library

[42]

H. Pirk et al. CPU and cache efficient management of memory-resident databases. In ICDE, 2013.

Digital Library

[43]

R. Ramamurthy, D. DeWitt, and Q. Su. A case for fractured mirrors. VLDB J., 12(2):89--101, 2003.

Digital Library

[44]

V. Raman et al. Constant-time query processing. In ICDE, 2008.

Digital Library

[45]

V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013.

Digital Library

[46]

J. Rao, H. Pirahesh, C. Mohan, and G. M. Lohman. Compiled query execution engine using JVM. In ICDE, 2006.

Digital Library

[47]

P. Rösch, L. Dannecker, G. Hackenbroich, and F. Faerber. A storage advisor for hybrid-store databases. PVLDB, 5(12):1748--1758, 2012.

Digital Library

[48]

D. Saccà and G. Wiederhold. Database partitioning in a cluster of processors. ACM Trans. Database Syst., 10(1):29--56, 1985.

Digital Library

[49]

K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis. COLT: continuous on-line tuning. In SIGMOD, 2006.

Digital Library

[50]

F. M. Schuhknecht, A. Jindal, and J. Dittrich. The Uncracked Pieces in Database Cracking. PVLDB, 7(2), 2013.

[51]

J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In DaMoN, 2011.

Digital Library

[52]

M. Stonebraker and U. Çetintemel. "One size fits all": An idea whose time has come and gone. In ICDE, 2005.

Digital Library

[53]

J. Zhou and K. Ross. A multi-resolution block storage model for database design. In IDEAS, 2003.

[54]

M. Zukowski and P. Boncz. Vectorwise: Beyond column stores. IEEE Data Eng. Bull., 35(1):21--27, 2012.

[55]

M. Zukowski, N. Nes, and P. Boncz. DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing. In DaMoN, pages 47--54, 2008.

Digital Library

Cited By

Zhang CLi GZhang JZhang XFeng J(2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3389693
Aref W(2024)On Native Location-Optimized Data Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00469(5675-5676)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00469
Song HZhou WCui HPeng XLi F(2024)A survey on hybrid transactional and analytical processingThe VLDB Journal10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 4-Jun-2024
https://doi.org/10.1007/s00778-024-00858-9
Show More Cited By

Index Terms

H2O: a hands-free adaptive store
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods
    2. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

A Dynamic Replication Mechanism to Reduce Response-Time of I/O Operations in High Performance Computing Clusters
SOCIALCOM '13: Proceedings of the 2013 International Conference on Social Computing

Extraordinary large datasets of high performance computing applications require improvement in existing storage and retrieval mechanisms. Moreover, enlargement of the gap between data processing and I/O operations' throughput will bound the system ...
RailwayDB: adaptive storage of interaction graphs

We are living in an ever more connected world, where data recording the interactions between people, software systems, and the physical world is becoming increasingly prevalent. These data often take the form of a temporally evolving graph, where ...
Proteus: Autonomous Adaptive Storage for Mixed Workloads
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

June 2014

1645 pages

ISBN:9781450323765

DOI:10.1145/2588555

General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SIGMOD/PODS'14

Sponsor:

SIGMOD

SIGMOD/PODS'14: International Conference on Management of Data

June 22 - 27, 2014

Utah, Snowbird, USA

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

93
Total Citations
View Citations
946
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)4

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang CLi GZhang JZhang XFeng J(2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3389693
Aref W(2024)On Native Location-Optimized Data Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00469(5675-5676)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00469
Song HZhou WCui HPeng XLi F(2024)A survey on hybrid transactional and analytical processingThe VLDB Journal10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 4-Jun-2024
https://doi.org/10.1007/s00778-024-00858-9
Liu PCai PLi CChen H(2024)AVPS: Automatic Vertical Partitioning for Dynamic WorkloadAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5618-6_13(146-157)Online publication date: 1-Aug-2024
https://doi.org/10.1007/978-981-97-5618-6_13
Mun JKaratsenidis KPapon TRoozkhosh SHoornaert DDrepper USanaullah AMancuso RAthanassoulis M(2023)On-the-Fly Data Transformation in ActionProceedings of the VLDB Endowment10.14778/3611540.361159316:12(3950-3953)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611593
Faria NPereira JAlonso AVilaça RKoning YNes N(2023)TiQuE: Improving the Transactional Performance of Analytical Systems for True Hybrid WorkloadsProceedings of the VLDB Endowment10.14778/3598581.359859816:9(2274-2288)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.14778/3598581.3598598
Sarkar SPapon TStaratzis DZhu ZAthanassoulis M(2023)Enabling Timely and Persistent Deletion in LSM-EnginesACM Transactions on Database Systems10.1145/359972448:3(1-40)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3599724
Papon THyoung Mun JRoozkhosh SHoornaert DSanaullah ADrepper UMancuso RAthanassoulis M(2023)Relational Fabric: Transparent Data Transformation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00297(3688-3698)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00297
Saxena HGolab LIdreos SIlyas I(2023)Real-Time LSM-Trees for HTAP Workloads2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00097(1208-1220)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00097
Vinçon TKnödler CSolis-Vasquez LBernhardt ATamimi SWeber LStock FKoch APetrov I(2022)Near-data processing in database systems on native computational storage under HTAP workloadsProceedings of the VLDB Endowment10.14778/3547305.354730715:10(1991-2004)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.14778/3547305.3547307
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents