research-article

Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs

Authors:

Semih Salihoglu,

Ken SalemAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 15, Issue 8

Pages 1533 - 1545

https://doi.org/10.14778/3529337.3529339

Published: 01 April 2022 Publication History

Abstract

This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate cardinality estimates. We show these estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains sub-queries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. While on acyclic queries and queries with small-size cycles, using the maximum-weight path is effective to address the well known underestimation problem, on queries with larger cycles these estimates tend to overestimate, which can be addressed by using minimum weight paths. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.

References

[1]

Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. 2016. Computing Join Queries with Functional Dependencies. In PODS.

[2]

Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. 2001. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB.

[3]

Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-Tuning Histograms: Building Histograms Without Looking at Data. In SIGMOD.

[4]

Güneş Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC.

[5]

A. Atserias, M. Grohe, and D. Marx. 2013. Size Bounds and Query Plans for Relational Joins. SICOMP 42, 4 (2013).

[6]

Walter Cai, Magdalena Balazinska, and Dan Suciu. 2019. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In SIGMOD.

[7]

Jeremy Chen, Yuqing Huang, Wang Mushi, Salihoglu Semih, and Salem Ken. 2022. Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs https://cs.uwaterloo.ca/~ssalihog/papers/ceg-long.pdf. Technical Report.

[8]

Yu Chen and Ke Yi. 2020. Random Sampling and Size Estimation Over Cyclic Joins. In ICDT.

[9]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2 ed.). The MIT Press.

[10]

DBLP 2012. DBLP 2012-11-28 Dump. https://dblp.org/.

[11]

Epinions 2003. Epinions. https://snap.stanford.edu/data/soc-Epinions1.html.

[12]

Lise Getoor, Benjamin Taskar, and Daphne Koller. 2001. Selectivity Estimation Using Probabilistic Models. In SIGMOD.

[13]

Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. 2012. Size and Treewidth Bounds for Conjunctive Queries. JACM 59, 3 (2012).

[14]

Graphflow 2019. Graphflow Source Code. https://tinyurl.com/wyuzb9pr.

[15]

Haas, Peter J. and Naughton, Jeffrey F. and Seshadri, S. and Swami, Arun N. 1996. Selectivity and Cost Estimation for Joins Based on Random Sampling. JCSS 52, 3 (1996).

[16]

Hetionet 2015. Hetionet v1.0. https://het.io/.

[17]

Manas Joglekar and Christopher Ré. 2018. It's All a Matter of Degree - Using Degree Information to Optimize Multiway Joins. TOCS 62, 4 (2018).

[18]

Kyoungmin Kim, Hyeonji Kim, George Fletcher, and Wook-Shin Han. 2021. Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In SIGMOD.

[19]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015).

[20]

Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In CIDR.

[21]

Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query Optimization Through the Looking Glass, and What We Found Running the Join Order Benchmark. VLDBJ 27, 5 (2018).

[22]

Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via Random Walks. In SIGMOD.

[23]

Angela Maduko, Kemafor Anyanwu, Amit Sheth, and Paul Schliekelman. 2008. Graph Summaries for Subgraph Frequency Estimation. In ESWC.

[24]

Yossi Matias, Jeffrey Scott Vitter, and Min Wang. 1998. Wavelet-Based Histograms for Selectivity Estimation. In SIGMOD.

[25]

Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. PVLDB 12, 11 (2019).

[26]

M. Muralikrishna and David J. DeWitt. 1988. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In SIGMOD.

[27]

Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In ICDEW.

[28]

Thomas Neumann and Guido Moerkotte. 2011. Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE.

[29]

Thomas Neumann and Gerhard Weikum. 2008. RDF-3X: A RISC-Style Engine for RDF. PVLDB 1, 1 (2008).

Digital Library

[30]

Yannis Papakonstantinou, Hector Garcia-Molina, and Jennifer Widom. 1995. Object Exchange Across Heterogeneous Information Sources. In ICDE.

[31]

Yeonsu Park, Seongyun Ko, Sourav S. Bhowmick, Kyoungmin Kim, Kijae Hong, and Wook-Shin Han. 2020. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In SIGMOD.

[32]

Neoklis Polyzotis and Minos Garofalakis. 2002. Statistical Synopses for Graph-Structured XML Databases. In SIGMOD.

[33]

Neoklis Polyzotis, Minos Garofalakis, and Yannis Ioannidis. 2004. Approximate XML Query Answers. In SIGMOD.

[34]

Viswanath Poosala and Yannis E. Ioannidis. 1997. Selectivity Estimation Without the Attribute Value Independence Assumption. In VLDB.

[35]

RDF3X 2020. RDF-3X Source Code. https://github.com/gh-rdf3x/gh-rdf3x/.

[36]

Giorgio Stefanoni, Boris Motik, and Egor V. Kostylev. 2018. Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. In WWW.

[37]

Wei Sun, Yibei Ling, Naphtali Rishe, and Yi Deng. 1993. An Instant and Accurate Size Estimation Method for Joins and Selections in a Retrieval-Intensive Environment. In SIGMOD.

[38]

David Vengerov, Andre Cavalheiro Menck, Mohamed Zait, and Sunil P. Chakkappen. 2015. Join Size Estimation Subject to Filter Conditions. PVLDB 8, 12 (2015).

[39]

Wei Wang, Haifeng Jiang, Hongjun Lu, and Jeffrey Xu Yu. 2004. Bloom Histogram: Path Selectivity Estimation for XML Data with Updates. In VLDB.

[40]

WatDiv2014. WatDiv v.0.6. https://dsg.uwaterloo.ca/watdiv/.

[41]

Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, and Wolfgang Lehner. 2019. Cardinality Estimation with Local Deep Learning Models. In aiDM.

[42]

Lucas Woltmann, Dominik Olwig, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. 2021. PostCENN: PostgreSQL with Machine Learning Models for Cardinality Estimation. PVLDB 14, 12 (2021).

Digital Library

[43]

Wentao Wu, Jeffrey F. Naughton, and Harneet Singh. 2016. Sampling-Based Query Re-Optimization. In SIGMOD.

[44]

Yuqing Wu, Jignesh M. Patel, and H. V. Jagadish. 2002. Estimating Answer Sizes for XML Queries. In EDBT.

[45]

YAGO 2008. YAGO 1. https://yago-knowledge.org/downloads/yago-1.

[46]

Ning Zhang, M. Tamer Ozsu, Ashraf Aboulnaga, and Ihab F. Ilyas. 2006. XSEED: Accurate and Fast Cardinality Estimation for XPath Queries. In ICDE.

Cited By

Birler AKemper ANeumann T(2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681995
Hu PMotik B(2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1145/3689209
Birler ASchmidt TFent PNeumann T(2024)Simple, Efficient, and Robust Hash Tables for Join ProcessingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663442(1-9)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663442
Show More Cited By

Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and ...
Workload-Driven Antijoin Cardinality Estimation

Antijoin cardinality estimation is among a handful of problems that has eluded accurate efficient solutions amenable to implementation in relational query optimizers. Given the widespread use of antijoin and subset-based queries in analytical workloads ...
Cardinality Estimation Techniques in Relational Database Systems

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 15, Issue 8

April 2022

220 pages

ISSN:2150-8097

Editors:
Fatma Özcan
Google
,
Juliana Freire
New York University
,
Xuemin Lin
University of New South Wales

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2022

Published in PVLDB Volume 15, Issue 8

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
164
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)35

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Birler AKemper ANeumann T(2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681995
Hu PMotik B(2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1145/3689209
Birler ASchmidt TFent PNeumann T(2024)Simple, Efficient, and Robust Hash Tables for Join ProcessingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663442(1-9)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3662010.3663442
Abo Khamis MNakos VOlteanu DSuciu D(2024)Join Size Bounds using lp-Norms on Degree SequencesProceedings of the ACM on Management of Data10.1145/36515972:2(1-24)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3651597
Deeds KSuciu DBalazinska M(2023)SafeBound: A Practical System for Generating Cardinality BoundsProceedings of the ACM on Management of Data10.1145/35889071:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588907

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents