skip to main content
research-article

Theoretically optimal and empirically efficient r-trees with strong parallelizability

Published: 01 January 2018 Publication History

Abstract

The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index bulk-loading, and over both practical and worst-case workloads. To address this need, we revisit a classic multidimensional access method - the R-tree. We propose a novel R-tree packing strategy that produces R-trees with an asymptotically optimal I/O complexity for window queries in the worst case. Our experiments show that the R-trees produced by the proposed strategy are highly efficient on real and synthetic data of different distributions. The proposed strategy is also simple to parallelize, since it relies only on sorting. We propose a parallel algorithm for R-tree bulk-loading based on the proposed packing strategy, and analyze its performance under the massively parallel communication model. Experimental results confirm the efficiency and scalability of the parallel algorithm over large data sets.

References

[1]
http://www.pokemongo.com.
[2]
http://en.wikipedia.org/wiki/Google_Search.
[3]
https://www.wired.com/2016/09/pokemon-go-just-fine-without/.
[4]
http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx.
[5]
http://download.oracle.com/otndocs/products/spatial/pdf/spatial_features_jsirev.pdf.
[6]
https://www.cse.ust.hk/~yike/prtree/.
[7]
http://madalgo.au.dk/tpie/.
[8]
https://nectar.org.au/.
[9]
https://www.census.gov/geo/maps-data/data/tiger-line.html.
[10]
D. Achakeev, M. Seidemann, M. Schmidt, and B. Seeger. Sort-based parallel loading of r-trees. In 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 62--70, 2012.
[11]
P. K. Agarwal, L. Arge, O. Procopiuc, and J. S. Vitter. A framework for index bulk loading and dynamization. In 28th International Colloquium on Automata, Languages and Programming, pages 115--127, 2001.
[12]
P. K. Agarwal, M. de Berg, J. Gudmundsson, M. Hammar, and H. J. Haverkort. Box-trees and r-trees with near-optimal query time. In 17th Annual Symposium on Computational Geometry (SCG), pages 124--133, 2001.
[13]
P. K. Agarwal, K. Fox, K. Munagala, and A. Nath. Parallel algorithms for constructing range and nearest-neighbor searching data structures. In PODS, pages 429--440, 2016.
[14]
A. Andoni, A. Nikolov, K. Onak, and G. Yaroslavtsev. Parallel algorithms for geometric graph problems. In STOC, pages 574--583, 2014.
[15]
L. Arge, M. D. Berg, H. Haverkort, and K. Yi. The priority r-tree: A practically efficient and worst-case optimal r-tree. ACM Transactions on Algorithms, 4(1):9:1--9:30, 2008.
[16]
P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. In PODS, pages 273--284, 2013.
[17]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990.
[18]
J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975.
[19]
S. Berchtold, D. A. Keim, and H.-P. Kriegel. The x-tree : An index structure for high-dimensional data. In VLDB, pages 28--39, 1996.
[20]
M. Berg, M. Kreveld, M. Overmars, and O. C. Schwarzkopf. Computational Geometry. Springer Berlin Heidelberg, 2000.
[21]
B. Chazelle. Functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427--462, 1988.
[22]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[23]
D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.-B. Yu. Client-server paradise. In VLDB, pages 558--569, 1994.
[24]
H. N. Gabow, J. L. Bentley, and R. E. Tarjan. Scaling and related techniques for geometry problems. In STOC, pages 135--143, 1984.
[25]
V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998.
[26]
Y. J. García R, M. A. López, and S. T. Leutenegger. A greedy algorithm for bulk loading r-trees. In 6th ACM International Symposium on Advances in Geographic Information Systems, pages 163--164, 1998.
[27]
M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal on Computing, 29(2):416--432, 1999.
[28]
R. Grossi and G. F. Italiano. Efficient cross-trees for external memory. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization, pages 87--106, 1999.
[29]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[30]
H. Haverkort and F. V. Walderveen. Four-dimensional hilbert curves for r-trees. Journal of Experimental Algorithmics, 16:1--19, 2008.
[31]
H. V. Jagadish. Spatial search with polyhedra. In ICDE, pages 311--319, 1990.
[32]
H. V. Jagadish. Analysis of the hilbert curve for representing two-dimensional space. Information Processing Letters, 62(1):17--22, 1997.
[33]
I. Kamel and C. Faloutsos. Parallel r-trees. In SIGMOD, pages 195--204, 1992.
[34]
I. Kamel and C. Faloutsos. Hilbert r-tree: An improved r-tree using fractals. In VLDB, pages 500--509, 1994.
[35]
K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non-replicating index structures. In ICDT, pages 257--276, 1999.
[36]
N. Koudas, C. Faloutsos, and I. Kamel. Declustering spatial databases on a multi-computer architecture. In EDBT, pages 592--614, 1996.
[37]
S. T. Leutenegger, J. M. Edgington, and M. A. Lopez. STR: A simple and efficient algorithm for r-tree packing. Technical report, 1997.
[38]
A. Mondal, M. Kitsuregawa, B. C. Ooi, and K. L. Tan. R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In 9th ACM International Symposium on Advances in Geographic Information Systems, pages 28--33, 2001.
[39]
B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge and Data Engineering, 13(1):124--141, 2001.
[40]
Y. Ohsawa and M. Sakauchi. A new tree type data structure with homogeneous nodes suitable for a very large spatial database. In ICDE, pages 296--303, 1990.
[41]
J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS, pages 181--190, 1984.
[42]
A. Papadopoulos and Y. Manolopoulos. Parallel bulk-loading of spatial data. Parallel Computing, 29(10):1419--1444, 2003.
[43]
N. Roussopoulos and D. Leifker. Direct spatial search on pictorial databases using packed r-trees. In SIGMOD, pages 17--31, 1985.
[44]
B. Schnitzer and S. T. Leutenegger. Master-client r-trees: a new parallel r-tree architecture. In SSDBM, pages 68--77, 1999.
[45]
T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The r+-tree: A dynamic index for multi-dimensional objects. In VLDB, pages 507--518, 1987.
[46]
Y. Tao, W. Lin, and X. Xiao. Minimal mapreduce algorithms. In SIGMOD, pages 529--540, 2013.
[47]
P. Xu and S. Tirthapura. Optimality of clustering properties of space-filling curves. ACM Transactions on Database Systems, 39(2):10:1--10:27, 2014.
[48]
S. You, J. Zhang, and L. Gruenwald. Parallel spatial query processing on gpus using r-trees. In 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 23--31, 2013.
[49]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Conference on Hot Topics in Cloud Computing, pages 10--10, 2010.
[50]
R. Zhang, P. Kalnis, B. C. Ooi, and K. Tan. Generalized multidimensional data mapping and query processing. ACM Transactions on Database Systems, 30(3):661--697, 2005.
[51]
R. Zhang, B. C. Ooi, and K.-L. Tan. Making the pyramid technique robust to query types and workloads. In ICDE, pages 313--324, 2004.
[52]
R. Zhang, J. Qi, M. Stradling, and J. Huang. Towards a painless index for spatial objects. ACM Transactions on Database Systems, 39(3):19:1--19:42, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 5
January 2018
274 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2018
Published in PVLDB Volume 11, Issue 5

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media