skip to main content
research-article

A forward scan based plane sweep algorithm for parallel interval joins

Published: 01 August 2017 Publication History

Abstract

The interval join is a basic operation that finds application in temporal, spatial, and uncertain databases. Although a number of centralized and distributed algorithms have been proposed for the efficient evaluation of interval joins, classic plane sweep approaches have not been considered at their full potential. A recent piece of related work proposes an optimized approach based on plane sweep (PS) for modern hardware, showing that it greatly outperforms previous work. However, this approach depends on the development of a complex data structure and its parallelization has not been adequately studied. In this paper, we explore the applicability of a largely ignored forward scan (FS) based plane sweep algorithm, which is extremely simple to implement. We propose two optimizations of FS that greatly reduce its cost, making it competitive to the state-of-the-art single-threaded PS algorithm while achieving a lower memory footprint. In addition, we show the drawbacks of a previously proposed hash-based partitioning approach for parallel join processing and suggest a domain-based partitioning approach that does not produce duplicate results. Within our approach we propose a novel breakdown of the partition join jobs into a small number of independent mini-join jobs with varying cost and manage to avoid redundant comparisons. Finally, we show how these mini-joins can be scheduled in multiple CPU cores and propose an adaptive domain partitioning, aiming at load balancing. We include an experimental study that demonstrates the efficiency of our optimized FS and the scalability of our parallelization framework.

References

[1]
L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable sweeping-based spatial join. In VLDB, 1998.
[2]
B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion b-tree. VLDB J., 5(4):264--275, 1996.
[3]
T. Brinkhoff, H. Kriegel, and B. Seeger. Efficient processing of spatial joins using r-trees. In SIGMOD, 1993.
[4]
F. Cafagna and M. H. Böhlen. Disjoint interval partitioning. VLDB J., 26(3):447--466, 2017.
[5]
B. Chawda, H. Gupta, S. Negi, T. A. Faruquie, L. V. Subramaniam, and M. K. Mohania. Processing interval joins on map-reduce. In EDBT, 2014.
[6]
R. Cheng, S. Singh, S. Prabhakar, R. Shah, J. S. Vitter, and Y. Xia. Efficient join processing over uncertain data. In CIKM, 2006.
[7]
A. Dignös, M. H. Böhlen, and J. Gamper. Overlap interval partition join. In SIGMOD, 2014.
[8]
J. Enderle, M. Hampel, and T. Seidl. Joining interval data in relational databases. In SIGMOD, 2004.
[9]
D. Gao, C. S. Jensen, R. T. Snodgrass, and M. D. Soo. Join operations in temporal databases. VLDB J., 14(1):2--29, 2005.
[10]
R. L. Graham. Bounds for multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17:416--429, 1969.
[11]
H. Gunadhi and A. Segev. Query processing algorithms for temporal intersection joins. In ICDE, 1991.
[12]
H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In VLDB, 1998.
[13]
M. Kaufmann, A. A. Manjili, P. Vagenas, P. M. Fischer, D. Kossmann, F. Färber, and N. May. Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. In SIGMOD, 2013.
[14]
H. Kriegel, P. Kunath, M. Pfeifle, and M. Renz. Distributed intersection join of complex interval sequences. In DASFAA, 2005.
[15]
H. Kriegel, M. Pötke, and T. Seidl. Managing intervals efficiently in object-relational databases. In VLDB, 2000.
[16]
T. Y. C. Leung and R. R. Muntz. Temporal query processing and optimization in multiprocessor database machines. In VLDB, 1992.
[17]
B. Moon, I. F. V. López, and V. Immanuel. Efficient algorithms for large-scale temporal aggregation. TKDE, 15(3):744--759, 2003.
[18]
D. Piatov, S. Helmer, and A. Dignös. An interval join optimized for modern hardware. In ICDE, 2016.
[19]
V. Poosala, Y. E. Ioannidis, P. J. Haas, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD, 1996.
[20]
F. P. Preparata and M. I. Shamos. Computational Geometry - An Introduction. Texts and Monographs in Computer Science. Springer, 1985.
[21]
A. Segev and H. Gunadhi. Event-join optimization in temporal relational databases. In VLDB, 1989.
[22]
I. Sitzmann and P. J. Stuckey. Improving temporal joins using histograms. In DEXA, 2000.
[23]
M. D. Soo, R. T. Snodgrass, and C. S. Jensen. Efficient evaluation of the valid-time natural join. In ICDE, 1994.
[24]
D. Zhang, V. J. Tsotras, and B. Seeger. Efficient temporal join processing using indices. In ICDE, 2002.

Cited By

View all
  1. A forward scan based plane sweep algorithm for parallel interval joins

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 11
    August 2017
    432 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2017
    Published in PVLDB Volume 10, Issue 11

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media