Article

Free access

Exploring bit-difference for approximate KNN search in high-dimensional databases

Authors:

Kian-Lee TanAuthors Info & Claims

ADC '05: Proceedings of the 16th Australasian database conference - Volume 39

Pages 165 - 174

Published: 30 January 2005 Publication History

Abstract

In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.

References

[1]

S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy (1999), Join synopses for approximate query answering, Proc. of the ACM SIGMOD Conference.]]

Digital Library

[2]

S. Berchtold, C. Bohm, D. Keim, F. Krebs, and H. P. Kriegel (2001), On optimizing nearest neighbor queries in high-dimensional data spaces, Proc. 8th ICDT Conference, pp. 435--449.]]

Digital Library

[3]

C. Bohm, S. Berchtold, D. Keim (2001), Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases, ACM Computing Surveys 33(3), pp. 322--373.]]

Digital Library

[4]

K. Chakrabarti and S. Mehrotra (2000), Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proc. of 26th VLDB Conference, pp. 89--100.]]

Digital Library

[5]

B. Cui, B. C. Ooi, J. W. Su, and K. L. Tan (2003), Contorting high dimensional data for efficient main memory processing, Proc. of the ACM SIGMOD Conference, pp. 479--490.]]

Digital Library

[6]

B. Cui, J. Hu, H. T. Shen, and C. Yu (2004), Adaptive quantization of the high-dimensional data for efficient knn processing, Proc. 9th DASFAA Conference.]]

[7]

R. Enbody (1999) Perfmon: Performance Monitoring Tool, available from http://www.cps.msu.edu/ enbody/perfmon.html.]]

[8]

H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi (2001), Approximate nearest neighbor searching in multimedia databases, Proc. of 17th ICDE Conference.]]

Digital Library

[9]

J. Goldstein, R. Ramakrishnan (2000), Contrast plots and p-sphere tree: Space vs. time in nearest neighbor searches, Proc. 26th VLDB Conference, pp. 429--440.]]

Digital Library

[10]

P. Indyk and R. Motwani (1998), Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th ACM STOC Conference, pp. 604--613.]]

Digital Library

[11]

K. Kim, S. K. Cha, and K. Kwon (2001), Optimizing multidimensional index trees for main memory access, Proc. of the ACM SIGMOD Conference, pp. 139--150.]]

Digital Library

[12]

N. Koudas, B. C. Ooi, H. T. Shen, A. K. H. Tung (2004), LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space, Proc. of 20th ICDE Conference, pp. 6--17.]]

Digital Library

[13]

V. Poosala and V. Ganti (1999), Data cube approximation and histograms via wavelets, Proc. of the 11th SSDM Conference.]]

[14]

J. Rao and K. Ross (2000), Making B+-trees cache conscious in main memory, Proc. of the ACM SIGMOD Conference, pp. 475--486.]]

Digital Library

[15]

Y. Sakurai, M. Yoshikawa, S. Uemura, H. Kojima (2000), The a-tree: An index structure for high-dimensional spaces using relative approximation, Proc. 26th VLDB Conference, pp. 516--526.]]

Digital Library

[16]

J. S. Vitter, M. Wang, and B. Iyer (1998), Fast approximate answers to aggregate queries on a data cube, Proc. of the 7th CIKM Conference.]]

[17]

R. Weber, H. J. Schek, S. Blott (1998), A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, Proc. 24th VLDB Conference, pp. 194--205.]]

Digital Library

[18]

R. Weber and K. Bohm (2000), Trading quality for time with nearest-neighbor search, Proc. of the EDBT Conference.]]

Digital Library

[19]

C. Yu, B. C. Ooi, K. L. Tan, and H. V. Jagadish (2001), Indexing the distance: An efficient method to knn processing, Proc. 27th VLDB Conference, pp. 421--430.]]

Digital Library

Cited By

Shi YCunningham HRuth PKraft N(2010)Towards improving a similarity search approachProceedings of the 48th annual ACM Southeast Conference10.1145/1900008.1900076(1-3)Online publication date: 15-Apr-2010
https://dl.acm.org/doi/10.1145/1900008.1900076
Shi YMcGregor J(2009)Towards solving similarity search problems using fuzzy concept for multi-dimensional dataProceedings of the 47th annual ACM Southeast Conference10.1145/1566445.1566556(1-3)Online publication date: 19-Mar-2009
https://dl.acm.org/doi/10.1145/1566445.1566556
Cheng RKao BPrabhakar SKwan ATu YBratbergsengen K(2005)Adaptive stream filters for entity-based queries with non-value toleranceProceedings of the 31st international conference on Very large data bases10.5555/1083592.1083600(37-48)Online publication date: 30-Aug-2005
https://dl.acm.org/doi/10.5555/1083592.1083600

Index Terms

Exploring bit-difference for approximate KNN search in high-dimensional databases
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Data management systems
    1. Database design and models
  2. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

LIDH: An Efficient Filtering Method for Approximate k Nearest Neighbor Queries Based on Local Intrinsic Dimension
Web and Big Data
Abstract
Due to the so-called “curse of dimensionality” causing poor performance when querying in the high-dimensional space, the high-dimensional approximate kNN (AkNN) query has been extensively explored to trade accuracy for efficiency. In this paper, ...
High-dimensional kNN joins with incremental updates

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. ...
Grassmann Hashing for approximate nearest neighbor search in high dimensional space
ICME '11: Proceedings of the 2011 IEEE International Conference on Multimedia and Expo

Locality-Sensitive Hashing (LSH) approximates nearest neighbors in high dimensions by projecting original data into low-dimensional subspaces. The basic idea is to hash data samples to ensure that the probability of collision is much higher for samples ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ADC '05: Proceedings of the 16th Australasian database conference - Volume 39

January 2005

180 pages

ISBN:192068221X

Editors:
Hugh E. Williams
RMIT University, Australia
,
Gill Dobbie
University of Auckland, New Zealand

Publisher

Australian Computer Society, Inc.

Australia

Publication History

Published: 30 January 2005

Author Tags

Qualifiers

Article

Conference

ADC '05

01 01 2005

Newcastle, Australia

Acceptance Rates

Overall Acceptance Rate 98 of 224 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
299
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)9

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shi YCunningham HRuth PKraft N(2010)Towards improving a similarity search approachProceedings of the 48th annual ACM Southeast Conference10.1145/1900008.1900076(1-3)Online publication date: 15-Apr-2010
https://dl.acm.org/doi/10.1145/1900008.1900076
Shi YMcGregor J(2009)Towards solving similarity search problems using fuzzy concept for multi-dimensional dataProceedings of the 47th annual ACM Southeast Conference10.1145/1566445.1566556(1-3)Online publication date: 19-Mar-2009
https://dl.acm.org/doi/10.1145/1566445.1566556
Cheng RKao BPrabhakar SKwan ATu YBratbergsengen K(2005)Adaptive stream filters for entity-based queries with non-value toleranceProceedings of the 31st international conference on Very large data bases10.5555/1083592.1083600(37-48)Online publication date: 30-Aug-2005
https://dl.acm.org/doi/10.5555/1083592.1083600

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten