skip to main content
10.5555/1082222.1082240dlproceedingsArticle/Chapter ViewAbstractPublication PagesadcConference Proceedingsconference-collections
Article
Free access

Exploring bit-difference for approximate KNN search in high-dimensional databases

Published: 30 January 2005 Publication History

Abstract

In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID+. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.

References

[1]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy (1999), Join synopses for approximate query answering, Proc. of the ACM SIGMOD Conference.]]
[2]
S. Berchtold, C. Bohm, D. Keim, F. Krebs, and H. P. Kriegel (2001), On optimizing nearest neighbor queries in high-dimensional data spaces, Proc. 8th ICDT Conference, pp. 435--449.]]
[3]
C. Bohm, S. Berchtold, D. Keim (2001), Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases, ACM Computing Surveys 33(3), pp. 322--373.]]
[4]
K. Chakrabarti and S. Mehrotra (2000), Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proc. of 26th VLDB Conference, pp. 89--100.]]
[5]
B. Cui, B. C. Ooi, J. W. Su, and K. L. Tan (2003), Contorting high dimensional data for efficient main memory processing, Proc. of the ACM SIGMOD Conference, pp. 479--490.]]
[6]
B. Cui, J. Hu, H. T. Shen, and C. Yu (2004), Adaptive quantization of the high-dimensional data for efficient knn processing, Proc. 9th DASFAA Conference.]]
[7]
R. Enbody (1999) Perfmon: Performance Monitoring Tool, available from http://www.cps.msu.edu/ enbody/perfmon.html.]]
[8]
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi (2001), Approximate nearest neighbor searching in multimedia databases, Proc. of 17th ICDE Conference.]]
[9]
J. Goldstein, R. Ramakrishnan (2000), Contrast plots and p-sphere tree: Space vs. time in nearest neighbor searches, Proc. 26th VLDB Conference, pp. 429--440.]]
[10]
P. Indyk and R. Motwani (1998), Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th ACM STOC Conference, pp. 604--613.]]
[11]
K. Kim, S. K. Cha, and K. Kwon (2001), Optimizing multidimensional index trees for main memory access, Proc. of the ACM SIGMOD Conference, pp. 139--150.]]
[12]
N. Koudas, B. C. Ooi, H. T. Shen, A. K. H. Tung (2004), LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space, Proc. of 20th ICDE Conference, pp. 6--17.]]
[13]
V. Poosala and V. Ganti (1999), Data cube approximation and histograms via wavelets, Proc. of the 11th SSDM Conference.]]
[14]
J. Rao and K. Ross (2000), Making B+-trees cache conscious in main memory, Proc. of the ACM SIGMOD Conference, pp. 475--486.]]
[15]
Y. Sakurai, M. Yoshikawa, S. Uemura, H. Kojima (2000), The a-tree: An index structure for high-dimensional spaces using relative approximation, Proc. 26th VLDB Conference, pp. 516--526.]]
[16]
J. S. Vitter, M. Wang, and B. Iyer (1998), Fast approximate answers to aggregate queries on a data cube, Proc. of the 7th CIKM Conference.]]
[17]
R. Weber, H. J. Schek, S. Blott (1998), A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, Proc. 24th VLDB Conference, pp. 194--205.]]
[18]
R. Weber and K. Bohm (2000), Trading quality for time with nearest-neighbor search, Proc. of the EDBT Conference.]]
[19]
C. Yu, B. C. Ooi, K. L. Tan, and H. V. Jagadish (2001), Indexing the distance: An efficient method to knn processing, Proc. 27th VLDB Conference, pp. 421--430.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ADC '05: Proceedings of the 16th Australasian database conference - Volume 39
January 2005
180 pages
ISBN:192068221X

Publisher

Australian Computer Society, Inc.

Australia

Publication History

Published: 30 January 2005

Author Tags

  1. approximate KNN query
  2. bit difference
  3. high-dimensional index structure
  4. memory processing

Qualifiers

  • Article

Conference

ADC '05
01 01 2005
Newcastle, Australia

Acceptance Rates

Overall Acceptance Rate 98 of 224 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)9
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media