skip to main content
research-article
Public Access

Joins via Geometric Resolutions: Worst Case and Beyond

Published: 08 November 2016 Publication History

Abstract

We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. In addition, we use our framework and the same algorithm to show a series of what are colloquially known as beyond worst-case results. The framework allows us to prove results for data stored in BTrees, multidimensional data structures, and even multiple indices per table. A key idea in our framework is formalizing the inference one does with an index as a type of geometric resolution, transforming the algorithmic problem of computing joins to a geometric problem. Our notion of geometric resolution can be viewed as a geometric analog of logical resolution. In addition to the geometry and logic connections, our algorithm can also be thought of as backtracking search with memoization.

Supplementary Material

a22-khamis-apndx.pdf (khamis.zip)
Supplemental movie, appendix, image and software files for, Joins via Geometric Resolutions: Worst Case and Beyond

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley.
[2]
Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Joins via geometric resolutions: Worst-case and beyond. In Proceedings of the 34th ACM Symposium on Principles of Database Systems (PODS’15). ACM, New York, NY, 213--228.
[3]
Peyman Afshani, Jérémy Barbay, and Timothy M. Chan. 2009. Instance-optimal geometric algorithms. In FOCS. 129--138.
[4]
Eric Allender, Lisa Hellerstein, Paul McCabe, Toniann Pitassi, and Michael E. Saks. 2006. Minimizing DNF formulas and AC0d circuits given a truth table. In IEEE Conference on Computational Complexity (CCC’06). IEEE Computer Society, 237--251.
[5]
Noga Alon. 1981. On the number of subgraphs of prescribed type of graphs with a given number of edges. Israel J. Math. 38, 1--2 (1981), 116--130.
[6]
Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discrete Appl. Math. 23, 1 (1989), 11--24.
[7]
Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In FOCS. IEEE Computer Society, 739--748.
[8]
Jérémy Barbay and Claire Kenyon. 2002. Adaptive intersection and t-threshold problems. In SODA. 390--399.
[9]
Jérémy Barbay and Claire Kenyon. 2008. Alternation and redundancy analysis of the intersection problem. ACM Trans. Algorithms 4, 1 (2008), 4:1--4:18.
[10]
Elisa Bertino, Beng Chin Ooi, Ron Sacks-Davis, Kian-Lee Tan, Justin Zobel, Boris Shidlovsky, and Daniele Andronico. 2012. Indexing Techniques for Advanced Database Systems. Springer.
[11]
Olaf Beyersdorff, Nicola Galesi, and Massimo Lauria. 2013. Parameterized complexity of DPLL search procedures. ACM Trans. Comput. Log. 14, 3 (2013), 20.
[12]
Bozhena Bidyuk and Rina Dechter. 2004. On finding minimal w-cutset. In UAI, David Maxwell Chickering and Joseph Y. Halpern (Eds.). AUAI Press, 43--50.
[13]
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD. ACM, 37--48.
[14]
T. M. Chan. 2013. Klee’s measure problem made easy. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS’13). 410--419.
[15]
Surajit Chaudhuri. 1998. An overview of query optimization in relational systems. In PODS. ACM, 34--43.
[16]
Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2 (2000), 211--229.
[17]
Jianer Chen, Songjian Lu, Sing-Hoi Sze, and Fenghui Zhang. 2007. Improved algorithms for path, matching, and packing problems. In SODA, Nikhil Bansal, Kirk Pruhs, and Clifford Stein (Eds.). SIAM, 298--307.
[18]
Martin Davis, George Logemann, and Donald Loveland. 1962. A machine program for theorem-proving. Comm. ACM 5 (1962), 394--397.
[19]
Martin Davis and Hilary Putnam. 1960. A computing procedure for quantification theory. J. Assoc. Comput. Mach. 7 (1960), 201--215.
[20]
N. G. de Bruijn, Ca. van Ebbenhorst Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wiskunde (2) 23 (1951), 191--193.
[21]
Rina Dechter. 1990. Enhancement schemes for constraint processing: Backjumping, learning, and cutset decomposition. Artif. Intell. 41, 3 (1990), 273--312.
[22]
Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann Publishers, San Francisco, CA.
[23]
Rina Dechter and Judea Pearl. 1988. Tree-clustering schemes for constraint-processing. In AAAI, Howard E. Shrobe, Tom M. Mitchell, and Reid G. Smith (Eds.). AAAI Press/MIT Press, 150--154.
[24]
Rina Dechter and Judea Pearl. 1989. Tree clustering for constraint networks. Artific. Intell. 38, 3 (1989), 353--366.
[25]
Rina Dechter and Irina Rish. 1994. Directional resolution: The Davis-Putnam procedure, revisited. In KR, Jon Doyle, Erik Sandewall, and Pietro Torasso (Eds.). Morgan Kaufmann, 134--145.
[26]
Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. 2000. Adaptive set intersections, unions, and differences. In SODA. 743--752.
[27]
Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In SAT (Lecture Notes in Computer Science), Enrico Giunchiglia and Armando Tacchella (Eds.), Vol. 2919. Springer, 502--518.
[28]
Ronald Fagin. 1983. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM 30, 3 (1983), 514--550.
[29]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’01). ACM, New York, NY, 102--113.
[30]
Wenfei Fan, Floris Geerts, Yang Cao, Ting Deng, and Ping Lu. 2015. Querying big data by accessing small data. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’15). ACM, New York, NY, 173--184.
[31]
Ehud Friedgut and Jeff Kahn. 1998. On the number of copies of one hypergraph in another. Israel J. Math. 105 (1998), 251--256.
[32]
Andreas Goerdt. 1992. Davis-Putnam resolution versus unrestricted resolution. Ann. Math. Artific. Intell. 6, 1--3 (1992), 169--184.
[33]
Georg Gottlob, Nicola Leone, and Francesco Scarcello. 2003. Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. J. Comput. Syst. Sci. 66, 4 (2003), 775--808.
[34]
Goetz Graefe. 1993. Query evaluation techniques for large databases. Comput. Surv. 25, 2 (June 1993), 73--170.
[35]
Jerrold R. Griggs. 1984. Maximum antichains in the product of chains. Order 1, 1 (1984), 21--28.
[36]
Martin Grohe. 2013. Bounds and algorithms for joins via fractional edge covers. In Search of Elegance in the Theory and Practice of Computation (Lecture Notes in Computer Science), Val Tannen, Limsoon Wong, Leonid Libkin, Wenfei Fan, Wang-Chiew Tan, and Michael P. Fourman (Eds.), Vol. 8000. Springer, 321--338.
[37]
Martin Grohe and Dániel Marx. 2006. Constraint solving via fractional edge covers. In SODA. ACM, 289--298.
[38]
Marc Gyssens, Peter Jeavons, and David A. Cohen. 1994. Decomposing constraint satisfaction problems using database techniques. Artif. Intell. 66, 1 (1994), 57--89.
[39]
Marc Gyssens and Jan Paredaens. 1982. A decomposition methodology for cyclic databases. In Advances in Data Base Theory. 85--122.
[40]
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389.
[41]
Phokion G. Kolaitis and Moshe Y. Vardi. 2000. Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61, 2 (2000), 302--332.
[42]
David Maier. 1983. The Theory of Relational Databases. Computer Science Press.
[43]
Joao Marques-Silva, Inês Lynce, and Sharad Malik. 2009. Conflict-driven clause learning SAT solvers. Handbook of Satisfiability 185 (2009), 131--153.
[44]
Dániel Marx. 2010a. Approximating fractional hypertree width. ACM Trans. Algorithms 6, 2 (2010), 29:1--29:17.
[45]
Dániel Marx. 2010b. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC. 735--744.
[46]
Dániel Marx. 2011. Tractable structures for constraint satisfaction with truth tables. Theory Comput. Syst. 48, 3 (2011), 444--464.
[47]
Dániel Marx. 2013. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J. ACM 60, 6 (2013), 42.
[48]
W. J. Masek. 1979. Some NP-complete set covering problems. Unpublished manuscript.
[49]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (October 2002), 824--827.
[50]
Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. 2001. Chaff: Engineering an efficient SAT solver. In DAC. ACM, 530--535.
[51]
Hung Q. Ngo, Dung T. Nguyen, Christopher Ré, and Atri Rudra. 2014. Beyond worst-case analysis for joins with minesweeper. In PODS. 234--245.
[52]
Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms {Extended abstract}. In PODS. 37--48.
[53]
Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2013. Skew strikes back: New developments in the theory of join algorithms. In SIGMOD RECORD. 5--16.
[54]
Dung Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. In Proceedings of the GRADES’15 (GRADES’15). ACM, New York, NY, Article 2, 8 pages.
[55]
Dan Olteanu and Jakub Zavodny. 2015. Size bounds for factorised representations of query results. ACM Trans. Database Syst. 40, 1 (March 2015), Article No. 2, 2:1--2:44.
[56]
Patrick E. O’Neil and Goetz Graefe. 1995. Multi-table joins through bitmapped join indices. SIGMOD Rec. 24, 3 (1995), 8--11.
[57]
Patrick E. O’Neil and Dallan Quass. 1997. Improved query performance with variant indexes. In SIGMOD Conference, Joan Peckham (Ed.). ACM Press, 38--49.
[58]
Mark H. Overmars and Chee-Keng Yap. 1991. New upper bounds in Klees measure problem. SIAM J. Comput. 20, 6 (1991), 1034--1045.
[59]
Christos H. Papadimitriou and Mihalis Yannakakis. 1997. On the complexity of database queries. In PODS. 12--19.
[60]
Mihai Pǎtraşcu. 2010. Towards polynomial lower bounds for dynamic problems. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC’10). 603--610.
[61]
Judea Pearl. 1989. Probabilistic Reasoning in Intelligent Systems - Networks of Plausible Inference. Morgan Kaufmann, I--XIX, 1--552 pages.
[62]
Natasa Przulj, Derek G. Corneil, and Igor Jurisica. 2004. Modeling interactome: Scale-free or geometric? Bioinformatics 20, 18 (2004), 3508--3515.
[63]
Raghu Ramakrishnan and Johannes Gehrke. 2003. Database Management Systems (3rd ed.). McGraw-Hill, New York, NY.
[64]
Neil Robertson and P. D. Seymour. 1986. Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms 7, 3 (1986), 309--322.
[65]
Tim Roughgarden. 2009. Lecture notes on beyond worst-case analysis (CS264). http://theory.stanford.edu/∼tim/notes.html.
[66]
Francesco Scarcello. 2005. Query answering exploiting structural properties. SIGMOD Rec. 34, 3 (2005), 91--99.
[67]
João P. Marques Silva and Karem A. Sakallah. 1996. GRASP - a new search algorithm for satisfiability. In ICCAD. 220--227.
[68]
João P. Marques Silva and Karem A. Sakallah. 1999. GRASP: A search algorithm for propositional satisfiability. IEEE Trans. Comput. 48, 5 (1999), 506--521.
[69]
Israel Spiegler and Rafi Maayan. 1985. Storage and retrieval considerations of binary data bases. Inf. Process. Manage. 21, 3 (1985), 233--254.
[70]
Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614.
[71]
Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In ICDM. IEEE Computer Society, 608--617.
[72]
Jeffrey D. Ullman. 1989. Principles of Database and Knowledge-Base Systems, Vol. II. Computer Science Press.
[73]
Todd L. Veldhuizen. 2014. Triejoin: A simple, worst-case optimal join algorithm. In ICDT. 96--106.
[74]
Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. In VLDB. 82--94.
[75]
Hantao Zhang. 1997. SATO: An efficient propositional prover. In CADE (Lecture Notes in Computer Science), William McCune (Ed.), Vol. 1249. Springer, 272--275.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 41, Issue 4
Invited Paper from EDBT 2015, Invited Paper from PODS 2015 and Regular Papers
December 2016
309 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3014437
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2016
Accepted: 01 June 2016
Revised: 01 June 2016
Received: 01 December 2015
Published in TODS Volume 41, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Relational join
  2. beyond worst-case analysis
  3. bounded-width join queries
  4. indices
  5. resolution

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • DARPA's XDATA Program
  • American Family Insurance
  • NSF
  • NSF CAREER
  • Toshiba
  • EarthCube Award
  • DEFT Program
  • ONR
  • Sloan Research Fellowship
  • DARPA's MEMEX program
  • Moore Foundation Data Driven Investigator Award
  • Google
  • Lightspeed Ventures

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)272
  • Downloads (Last 6 weeks)33
Reflects downloads up to 30 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media