skip to main content
10.1145/342009.335412acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

Spatial join selectivity using power laws

Published: 16 May 2000 Publication History

Abstract

We discovered a surprising law governing the spatial join selectivity across two sets of points. An example of such a spatial join is “find the libraries that are within 10 miles of schools”. Our law dictates that the number of such qualifying pairs follows a power law, whose exponent we call “pair-count exponent” (PC). We show that this law also holds for self-spatial-joins (“find schools within 5 miles of other schools”) in addition to the general case that the two point-sets are distinct. Our law holds for many real datasets, including diverse environments (geographic datasets, feature vectors from biology data, galaxy data from astronomy).
In addition, we introduce the concept of the Box-Occupancy-Product-Sum (BOPS) plot, and we show that it can compute the pair-count exponent in a timely manner, reducing the run time by orders of magnitude, from quadratic to linear. Due to the pair-count exponent and our analysis (Law 1), we can achieve accurate selectivity estimates in constant time (O(1)) without the need for sampling or other expensive operations. The relative error in selectivity is about 30% with our fast BOPS method, and even better (about 10%), if we use the slower, quadratic method.

References

[1]
L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, J.S. Vitter - "Scalable Sweeping-Based Spatial Join". VLDB 1998, pp. 570-581.
[2]
A.Belussi and C. Faloutsos- "Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension". VLDB 1995, pp. 299-310.
[3]
T. Brinkhoff, H. P. Kriegel, B. Seeger - "Efficient Processing of Spatial Joins using R-trees", SIGMOD 1993. pp. 237-246.
[4]
J. Van den Bercken, B. Seeger, P. Widmayer - "The Bulk Index Join: A Generic Approach to Processing Non-Equijoins". ICDE 1999, pp. 257.
[5]
Bureau of the Census - Tiger#Line Precensus Files: 1990 technical documentation. Bureau of the Census. Washington, D.C. 1989.
[6]
S. Christodoulakis- "Implications of Certain Assumptions in Database Performance Evaluation". TODS 9(2), 1984, pp. 163-186.
[7]
S. Chaudhuri, R. Motwani, V. R. Narasayya - "On Random Sampling over Joins". SIGMOD 1999, pp. 263-274.
[8]
D. J. DeWitt, J. F. Naughton, D. A. Schneider - "An Evaluation of Non-Equijoin Algorithms". VLDB 1991, pp. 443-452.
[9]
C. Faloutsos, H.V. Jagadish and N. Sidiropoulos - "Information Recovery from Partial data". Tech. Report ISR-TR-97-7, Inst. For Systems Research, Univ. of Maryland, College Park, MD, 1997.
[10]
C. Faloutsos, I. Kamel -"Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension". PODS 1994, pp. 4-13.
[11]
C. Faloutsos, M. Ranganathan, Y. Manolopoulos - "Fast Subsequence Matching in Time-Series Databases". SIGMOD 1994, pp. 419-429.
[12]
O. Gfinther - "Efficient Computation of Spatial Joins". ICDE 1993, pp. 50-59.
[13]
R. H. Gfiting - "An Introduction to Spatial Database Systems". The VLDB Journal. 3(4). October 1994. pp. 357-399.
[14]
K. H#it6nen, M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen- "Knowledge Discovery from Telecommunication Network Alarm Databases". ICDE 1996, pp. 115-122.
[15]
N. Koudas, K.C. Sevcik, - "Size Separation Spatial Join". SIGMOD 1997, 324-335.
[16]
N. Koudas, K. C. Sevcik, - "High Dimensional Similarity Joins: Algorithms and Performance Evaluation". ICDE 1998, pp. 466-475.
[17]
A. Christian Kvnig, G. Weikum- "Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation".VLDB 1999, pp.423-434.
[18]
M.-L.Lo, C. V. Ravishankar- "Spatial Joins using Seeded Trees". SIGMOD 1994, pp. 209-220.
[19]
N. Mamoulis, D. Papadias - "Integration of Spatial Join Algorithms for Processing Multiple Inputs". SIGMOD 1999. pp.l-12.
[20]
H. Mannila, H. Toivonen, A. I. Verkamo - "Discovering Frequent Episodes in Sequences". KDD 1995, pp.210-215.
[21]
R. T. Ng, J, Han- "Efficient and Effective Clustering Methods for Spatial Data Mining". VLDB 1994, pp. 144-155.
[22]
J. Orenstein, - "Spatial Query Processing in an Object-Oriented Database System". SIGMOD 1986, pp. 326-33.
[23]
J. M. Patel, D. J. DeWitt, - "Partition Based Spatial-Merge Join". SIGMOD 1996, pp. 259-270.
[24]
V. Poosala - "Histogramm-based estimation techniques in databases". PhD thesis, Univ. of Wisconsin-Madison, 1997.
[25]
D. Papadias, N. Mamoulis, Y. Theodoridis - "Processing and Optimization of Multiway Spatial Joins Using R-Trees". PODS 1999, pp 44-55.
[26]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, T. T. Price, - "Access Path Selection in a Relational Database Management System". SIGMOD 1979, pp. 23-34.
[27]
D. W. Scott - Multivariate Density Estimation, Wiley & Sons 1992.
[28]
B. W. Silverman - Density Estimation for Statistics and Data Analysis. Chapman & Hall 1986.
[29]
K. C. Sevcik, N. Koudas - "Filter Trees for Managing Spatial Data over a Range of Size Granularities". VLDB 1996, pp. 16-27.
[30]
K. Shim, R. Srikant, R.Agrawal - "High- Dimensional Similarity Joins". ICDE 1997. pp. 301- 311.
[31]
M. Turk and A. Pentland- "Eigenfaces for Recognition". Journal of cognitive Neuroscience, vol 3(1), 1991, pp. 71-86.
[32]
Y. Theodoridis, T. K. Sellis - "A Model for the Prediction of R-tree Performance". PODS 1996, pp.161-171.
[33]
Y. Theodoridis, E. Stefanakis, T. K. Sellis - "Cost Models for Join Queries in Spatial Databases". ICDE 1998, pp. 476-483.
[34]
H. D. Wactlar, T. Kanade, M.A. Smith and S. M. Stevens - "Intelligente Access to Digital Video: Informedia Project". IEEE Computer, vo129(3), pp. 46-52, May 1996.
[35]
J. S. Vitter, M. Wang- "Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets". SIGMOD 1999, pp. 193-204.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2000

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS00
Sponsor:

Acceptance Rates

SIGMOD '00 Paper Acceptance Rate 42 of 248 submissions, 17%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)12
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media