Scalable computation of acyclic joins

A Pagh, R Pagh - Proceedings of the twenty-fifth ACM SIGMOD-SIGACT …, 2006 - dl.acm.org
A Pagh, R Pagh
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on …, 2006dl.acm.org
The join operation of relational algebra is a cornerstone of relational database systems.
Computing the join of several relations is NP-hard in general, whereas special (and typical)
cases are tractable. This paper considers joins having an acyclic join graph, for which
current methods initially apply a full reducer to efficiently eliminate tuples that will not
contribute to the result of the join. From a worst-case perspective, previous algorithms for
computing an acyclic join of k fully reduced relations, occupying a total of n≥ k blocks on …
The join operation of relational algebra is a cornerstone of relational database systems. Computing the join of several relations is NP-hard in general, whereas special (and typical) cases are tractable. This paper considers joins having an acyclic join graph, for which current methods initially apply a full reducer to efficiently eliminate tuples that will not contribute to the result of the join. From a worst-case perspective, previous algorithms for computing an acyclic join of k fully reduced relations, occupying a total of n≥k blocks on disk, use Ω((n+z)k) I/Os, where z is the size of the join result in blocks.In this paper we show how to compute the join in a time bound that is within a constant factor of the cost of running a full reducer plus sorting the output. For a broad class of acyclic join graphs this is O(sort(n+z)) I/Os, removing the dependence on k from previous bounds. Traditional methods decompose the join into a number of binary joins, which are then carried out one by one. Departing from this approach, our technique is based on computing the size of certain subsets of the result, and using these sizes to compute the location(s) of each data item in the result.Finally, as an initial study of cyclic joins in the I/O model, we show how to compute a join whose join graph is a 3-cycle, in O(n2/m+sort(n+z)) I/Os, where m is the number of blocks in internal memory.
ACM Digital Library