skip to main content
10.1145/3183713.3183733acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Adaptive Optimization of Very Large Join Queries

Published: 27 May 2018 Publication History

Abstract

The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.

Supplementary Material

Read me (3183713.3183733_readme.pdf)
Source Code (3183713.3183733_source_code.zip)

References

[1]
Renzo Angles, Peter A. Boncz, Josep-Lluis Larriba-Pey, Irini Fundulaki, Thomas Neumann, Orri Erling, Peter Neubauer, Norbert Martínez-Bazan, Venelin Kotsev, and Ioan Toma. 2014. The linked data benchmark council: a graph and RDF industry benchmarking effort. SIGMOD Record 43, 1 (2014), 27--31.
[2]
Nicolas Bruno, César A. Galindo-Legaria, and Milind Joshi. 2010. Polynomial heuristics for query optimization. In Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1--6, 2010, Long Beach, California, USA. 589--600.
[3]
Sourav Chatterji, Sai Surya Kiran Evani, Sumit Ganguly, and Mahesh Datt Yemmanuru. 2002. On the Complexity of Approximate Query Optimization. In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3--5, Madison, Wisconsin, USA. 282--292.
[4]
Yijou Chen, Richard L. Cole, William J. McKenna, Sergei Perfilov, Aman Sinha, and Eugene Szedenits Jr. 2009. Partial join order optimization in the paraccel analytic database. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. 905--908.
[5]
Sophie Cluet and Guido Moerkotte. 1995. On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products. In Database Theory - ICDT'95, 5th International Conference, Prague, Czech Republic, January 11--13, 1995, Proceedings. 54--67.
[6]
David DeHaan and Frank Wm. Tompa. 2007. Optimal top-down join enumeration. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12--14, 2007. 785--796.
[7]
Nicolas Dieu, Adrian Dragusanu, Françoise Fabret, François Llirbat, and Eric Simon. 2009. 1, 000 Tables Inside the From. PVLDB 2, 2 (2009), 1450--1461. http://www.vldb.org/pvldb/2/vldb09--1077.pdf
[8]
Leonidas Fegaras. 1998. A New Heuristic for Optimizing Large Queries. In Database and Expert Systems Applications, 9th International Conference, DEXA '98, Vienna, Austria, August 24--28, 1998, Proceedings. 726--735.
[9]
Pit Fender and Guido Moerkotte. 2013. Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs. PVLDB 6, 14 (2013), 1822--1833. http://www.vldb. org/pvldb/vol6/p1822-fender.pdf
[10]
Gurobi Optimization, Inc. 2016. Gurobi Optimizer Reference Manual. (2016). http://www.gurobi.com
[11]
Laura M. Haas, Michael J. Carey, Miron Livny, and Amit Shukla. 1997. Seeking the Truth About ad hoc Join Costs. VLDB J. 6, 3 (1997), 241--256.
[12]
R. Hipp et al. 2015. SQLite (Version 3.8.10.2). SQLite Development Team. Available from https://www.sqlite.org/download.html. (2015).
[13]
Toshihide Ibaraki and Tiko Kameda. 1984. On the Optimal Nesting Order for Computing N-Relational Joins. ACM Trans. Database Syst. 9, 3 (1984), 482--502.
[14]
Donald Kossmann and Konrad Stocker. 2000. Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. Database Syst. 25, 1 (2000), 43--82.
[15]
Ravi Krishnamurthy, Haran Boral, and Carlo Zaniolo. 1986. Optimization of Nonrecursive Queries. In VLDB'86 Twelfth International Conference on Very Large Data Bases, August 25--28, 1986, Kyoto, Japan, Proceedings. 128--137. http://www. vldb.org/conf/1986/P128.PDF
[16]
Joseph B Kruskal. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society 7, 1 (1956), 48--50.
[17]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (2015), 204--215. http://www.vldb.org/pvldb/vol9/p204-leis.pdf
[18]
Guy M Lohman. 2014. Is query optimization a "solved" problem. In Proc. Workshop on Database Query Optimization. Oregon Graduate Center Comp. Sci. Tech. Rep, 13.
[19]
Norman May, Alexander Böhm, and Wolfgang Lehner. 2017. SAP HANA - The Evolution of an In-Memory DBMS from Pure OLAP Processing Towards Mixed Workloads. In Datenbanksysteme für Business, Technologie und Web (BTW 2017), 17. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 6.-10. März 2017, Stuttgart, Germany, Proceedings. 545--563.
[20]
Guido Moerkotte, Pit Fender, and Marius Eich. 2013. On the correct and complete enumeration of the core search space. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013. 493--504.
[21]
Guido Moerkotte and Thomas Neumann. 2006. Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12--15, 2006. 930--941. http://dl.acm.org/citation.cfm?id=1164207
[22]
Guido Moerkotte and Thomas Neumann. 2008. Dynamic programming strikes back. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10--12, 2008. 539--552.
[23]
Thomas Neumann. 2009. Query simplification: graceful degradation for joinorder optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. 403--414.
[24]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, Boston, Massachusetts, May 30 - June 1. 23--34.
[25]
Anil Shanbhag and S. Sudarshan. 2014. Optimizing Join Enumeration in Transformation-based Query Optimizers. PVLDB 7, 12 (2014), 1243--1254. http://www.vldb.org/pvldb/vol7/p1243-shanbhag.pdf
[26]
Michael Steinbrunn, Guido Moerkotte, and Alfons Kemper. 1997. Heuristic and Randomized Optimization for the Join Ordering Problem. VLDB J. 6, 3 (1997), 191--208.
[27]
Arun N. Swami. 1989. Optimization of Large Join Queries: Combining Heuristic and Combinatorial Techniques. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, Portland, Oregon, May 31 - June 2, 1989. 367--376.
[28]
Transaction Processing Performance Council 2017. TPC Benchmark DS. Transaction Processing Performance Council. http://www.tpc.org/
[29]
Transaction Processing Performance Council 2017. TPC Benchmark H. Transaction Processing Performance Council. http://www.tpc.org/
[30]
Immanuel Trummer and Christoph Koch. 2017. Solving the Join Ordering Problem via Mixed Integer Linear Programming. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017. 1025--1040.
[31]
Florian Waas and Arjan Pellenkoft. 2000. Join Order Selection - Good Enough Is Easy. In Advances in Databases, 17th British National Conferenc on Databases, BNCOD 17, Exeter, UK, July 3--5, 2000, Proceedings. 51--67.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

  • European Research Council

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)405
  • Downloads (Last 6 weeks)57
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media