skip to main content
article
Free access

Self-spacial join selectivity estimation using fractal concepts

Published: 01 April 1998 Publication History

Abstract

The problem of selectivity estimation for queries of nontraditional databases is still an open issue. In this article, we examine the problem of selectivity estimation for some types of spatial queries in databases containing real data. We have shown earlier [Faloutsos and Kamel 1994] that real point sets typically have a nonuniform distribution, violating consistently the uniformity and independence assumptions. Moreover, we demonstrated that the theory of fractals can help to describe real point sets. In this article we show how the concept of fractal dimension, i.e., (noninteger) dimension, can lead to the solution for the selectivity estimation problem in spatial databases. Among the infinite family of fractal dimensions, we consider here the Hausdorff fractal dimension D0 and the “Correlation” fractal dimension D2. Specifically, we show that (a) the average number of neighbors for a given point set follows a power law, with D2 as exponent, and (b) the average number of nonempty range queries follows a power law with E − D0 as exponent (E is the dimension of the embedding space). We present the formulas to estimate the selectivity for “biased” range queries, for self-spatial joins, and for the average number of nonempty range queries. The result of some experiments on real and synthetic point sets are shown. Our formulas achieve very low relative errors, typically about 10%, versus 40%–100% of the formulas that are based on the uniformity and independence assumptions.

References

[1]
AGRAWAL, R. AND SRIKANT, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th Conference on Very Large Data Bases (Santiago, Chile, Sept. 1994). VLDB Endowment, Berkeley, CA, 487-499.
[2]
AGRAWAL, R., IMIELINSKI, T., AND SWAMI, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (Washington, DC, May 26-28), P. Buneman and S. Jajodia, Eds. ACM Press, New York, NY, 207-216.
[3]
AREF, W. G. AND SAMET, H. 1991. Optimization strategies for spatial query processing. In Proceedings of the 17th Conference on Very Large Data Bases (Barcelona, Spain, Sept.). VLDB Endowment, Berkeley, CA, 81-90.
[4]
ARYA, M., CODY, W., FALOUTSOS, C., RICHARDSON, J., AND TOGA, A. 1993. Qbism: A prototype 3-d medial imaging database system. IEEE Data Eng. Tech. Bull. 16, 1 (Mar.), 38-42.
[5]
BARNSLEY, M. F. AND SLOAN, A. D. 1988. A better way to compress images. BYTE 13, 1 (Jan.), 215-223.
[6]
BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND SEEGER, B. 1990. The W-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (Atlantic City, NJ, May 23-25). ACM Press, New York, NY, 322-331.
[7]
CASTAGLI, M. AND EUBANK, S., EDS. 1992. Nonlinear Modeling and Forecasting. SFI Stud-ies in the Science of Complexity, vol. 12. Addison-Wesley Publishing Co., Reading, MA.
[8]
CHRISTODOULAKIS, S. 1984. Implications of certain assumptions in database performance evaluation. ACM Trans. Database Syst. 9, 2 (June), 163-186.
[9]
DEWITT, D. J., KABRA, N., PATEL, J. M., AND YU, J.-B. 1994. Client-server paradise. In Proceedings of the 20th Conference on Very Large Data Bases (Santiago, Chile, Sept. 1994). VLDB Endowment, Berkeley, CA.
[10]
FALOUTSOS, C. 1992. Analytical results on the quadtree decomposition of arbitrary rectangles. Pattern Recogn. Lett. 13, 1 (Jan.), 31-40.
[11]
FALOUTSOS, C. AND JAGADISH, H.V. 1992. On b-tree indices for skewed distributions. In Proceedings of the 18th Conference on Very Large Data Bases. VLDB Endowment, Berkeley, CA, 363-374.
[12]
FALOUTSOS, C. AND KAMEL, I. 1993. Packed r-trees using fractals. Tech. Rep. TR-93-1, University of Maryland, College Park, MD.
[13]
FALOUTSOS, C. AND KAMEL, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the 13th ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Minneapolis, MN, May 24-26, 1994). ACM Press, New York, NY, 4-13.
[14]
FALOUTSOS, C., BARBER, R., FLICKNER, M., HAFNER, J., NIBLACK, W., PETKOVIC, D., AND EQUITZ, W. 1994a. Efficient and effective querying by image content. J. Intell. Inf. Syst. 3, 3/4 (July), 231-262.
[15]
FALOUTSOS, C., RANGANATHAN, M., AND MANOLOPOULOS, Y. 1994b. Fast subsequence matching in time-series databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, MN, May 24-27, 1994), R. T. Snodgrass and M. Winslett, Eds. ACM Press, New York, NY, 419-429.
[16]
FALOUTSOS, C., SELLIS, T., AND ROUSSOPOULOS, N. 1987. Analysis of object oriented spatial access methods. In Proceedings of the ACM SIGMOD International Conference on the Management of Data (San Francisco, CA, May 27-29, 1987), U. Dayal, Ed. ACM Press, New York, NY, 426-439.
[17]
GRASSBERGER, P. 1990. An optimized box-assisted algorithm for fractal dimensions. Physics Letters A 148, 1/2, 63-68.
[18]
GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In SIGMOD '84: Proceedings of the Annual Meeting (Boston, MA, JunelS-21, 1984). ACM, New York, NY, 47-57.
[19]
IBRAHIM, K. AND FALOUTSOS, C. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th Conference on Very Large Data Bases (Santiago, Chile, Sept. 1994). VLDB Endowment, Berkeley, CA, 500-509.
[20]
IOANNIDIS, Y. E. AND CHRISTODOULAKIS, S. 1991. On the propagation of errors in the size of join results. In Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data (Denver, CO, May 29-31, 1991), J. Clifford and R. King, Eds. ACM Press, New York, NY, 268-277.
[21]
IOANNIDIS, Y. E. AND CHRISTODOULAKIS, S. 1993. Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. 18, 4 (Dec.), 709-748.
[22]
MANDELBROT, B. 1977. Fractal Geometry of Nature. W.H. Freeman & Co., New York, NY.
[23]
MURALIKRISHNA, M. AND DEWITT, D.J. 1988. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In Proceedings of the Conference on Management of Data (Chicago, IL, June 1-3, 1988), H. Boral and P.-A. Larson, Eds. ACM Press, New York, NY, 28-36.
[24]
NELSON, R. AND SAMET, g. 1986. A population analysis of quadtrees with variable node size. Tech. Rep. CAR-TR-241 (also CS-TR-1740, DCR-86-05557), Computer Science Department, University of Maryland, College Park, MD.
[25]
ORENSTEIN, J. 1986. Spatial query processing in an object-oriented database system. In Proceedings of the Conference on Management of Data (Washington, D.C., May 28-30, 1986), C. Zaniolo, Ed. ACM Press, New York, NY, 326-336.
[26]
ORENSTEIN, J.A. 1990. A comparison of spatial query processing techniques for native and parameter spaces. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (Atlantic City, NJ, May 23-25). ACM Press, New York, NY, 343-352.
[27]
PAGEL, B.-U., SIX, H.-W., TOBEN, H., AND WIDMAYER, P. 1993. Towards an analysis of range query performance in spatial data structures. In Proceedings of the 12th ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Washington, DC, May 25-28, 1993). ACM Press, New York, NY, 214-221.
[28]
PAPADOPOULOS, n. AND MANOLOPOULOS, Y. 1997. Performance of nearest neighbor queries in r-trees. In Proceedings of the 6th International Conference on Database Theory.{
[29]
ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (San Jose, CA, May 23-25, 1995), M. Carey and D. Schneider, Eds. ACM Press, New York, NY, 71-79.
[30]
SAMET, g. 1990. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA.
[31]
SCHROEDER, M. 1991. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman & Co., New York, NY.
[32]
SCHUSTER, g.a. 1988. Deterministic Chaos. VCH Publishers, New York, NY.
[33]
SELINGER, P. G., ASTRAHAN, M. M., LORIE, R. A., AND PRICE, T.G. 1979. Access path selection in a relational database management system. In Proceedings of the ACM-SIGMOD 1979 International Conference on Management of Data (Boston, MA, May 30-June 1, 1979). ACM, New York, NY, 23-34.
[34]
SMITH, R.L. 1992. Optimal estimation of fractal dimension. In Nonlinear Modeling and Forecasting, M. Castagli and S. Eubank, Eds. SFI Studies in the Science of Complexity, vol. 12. Addison-Wesley Publishing Co., Reading, MA, 115-135.
[35]
STONEBRAKER, M., FREW, J., GARDELS, K., AND MEREDITH, J. 1993. The Sequoia 2000 storage benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (Washington, DC, May 26-28), P. Buneman and S. Jajodia, Eds. ACM Press, New York, NY.
[36]
ZIPF, G.K. 1949. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley Publishing Co., Reading, MA.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 16, Issue 2
April 1998
101 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/279339
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1998
Published in TOIS Volume 16, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fractal dimension
  2. range query
  3. selectivity estimation
  4. spatial join

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)11
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media