skip to main content
10.5555/645504.656421guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces

Published: 04 January 2001 Publication History

Abstract

Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance the performance of high-dimensional index structures. The model is based on new insights into effects occurring in high-dimensional space and provides a closed formula for the processing costs of nearest neighbor queries depending on the dimensionality, the block size and the database size. From the wide range of possible applications of our model, we select two interesting samples: First, we use the model to prove the known linear complexity of the nearest neighbor search problem in high-dimensional space, and second, we provide a technique for optimizing the block size. For data of medium dimensionality, the optimized block size allows significant speed-ups of the query processing time when compared to traditional block sizes and to the linear scan.

References

[1]
Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J.: 'A Basic Local Alignment Search Tool', Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403- 410.
[2]
Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331.
[3]
Berchtold S., Böhm C., Keim D., Kriegel H.-P.: 'A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space', Proc. ACM PODS Int. Conf. on Principles of Databases, Tucson, Arizona, 1997.
[4]
Berchtold S., Böhm C., Braunmüller B., Keim D., Kriegel H.-P.: 'Fast Parallel Similarity Search in Multimedia Databases', Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, 1997.
[5]
Berchtold S., Keim D. A.: 'High-dimensional Index Structures: Database Support for Next Decades's Applications', Tutorial, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1998, p. 501.
[6]
Berchtold S., Keim D., Kriegel H.-P.: 'The X-tree: An Index Structure for High-Dimensional Data', 22nd Conf. on Very Large Databases, 1996, Bombay, India.
[7]
Berchtold S., Keim D., Kriegel H.-P.: 'Fast Searching for Partial Similarity in Polygon Databases', VLDB Journal, Dec. 1997.
[8]
Ciacia P., Patella M., Zezula P.: 'A Cost Model for Similarity Queries in Metric Spaces', Proc. ACM PODS Int. Conf. on Principals of Databases, Seattle, WA, 1998, pp. 59-68.
[9]
Cleary J. G.: 'Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space', ACM Transactions on Mathematical Software, Vol. 5, No. 2, June 1979, pp.183-192.
[10]
Faloutsos C., Barber R., Flickner M., Hafner J., et al.: 'Efficient and Effective Querying by Image Content', Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231-262.
[11]
Friedman J. H., Bentley J. L., Finkel R. A.: 'An Algorithm for Finding Best Matches in Logarithmic Expected Time', ACM Transactions on Mathematical Software, Vol. 3, No. 3, September 1977, pp. 209-226.
[12]
Hjaltason G. R., Samet H.: 'Ranking in Spatial Databases', Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83-95.
[13]
Katayama N., Satoh S.: 'The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997.
[14]
Kukich K.: 'Techniques for Automatically Correcting Words in Text', ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377-440.
[15]
Jagadish H. V.: 'A Retrieval Technique for Similar Shapes', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208-217.
[16]
Lin K., Jagadish H. V., Faloutsos C.: 'The TV-tree: An Index Structure for High-Dimensional Data', VLDB Journal, Vol. 3, 1995, pp. 517-542.
[17]
Mehrotra R., Gary J. E.: 'Feature-Based Retrieval of Similar Shapes', Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria, 1993, pp. 108-115.
[18]
Mehrotra R., Gary J. E.: 'Feature-Index-Based Similar Shape Retrieval', Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995.
[19]
Roussopoulos N., Kelley S., Vincent F.: 'Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71-79.
[20]
Shawney H., Hafner J.: 'Efficient Color Histogram Indexing', Proc. Int. Conf. on Image Processing, 1994, pp. 66-70.
[21]
Shoichet B. K., Bodian D. L., Kuntz I. D.: 'Molecular Docking Using Shape Descriptors', Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380-397.
[22]
Sproull R.F.: 'Refinements to Nearest Neighbor Searching in k-Dimensional Trees', Algorithmica 1991, pp. 579-589.
[23]
Wallace T., Wintz P.: 'An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors', Computer Graphics and Image Processing, Vol. 13, pp. 99-126, 1980.
[24]
Weber R., Schek H.-J., Blott S.: 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces', Proc. Int. Conf. on Very Large Databases, New York, 1998.
[25]
White, D., Jain R.: 'Similarity Indexing with the SS-Tree', Proc. 12th Int. Conf. on Data Engineering, New Orleans, LA, 1996, pp. 516-523.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICDT '01: Proceedings of the 8th International Conference on Database Theory
January 2001
449 pages
ISBN:3540414568

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 04 January 2001

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media