skip to main content
research-article

Query-aware locality-sensitive hashing for approximate nearest neighbor search

Published: 01 September 2015 Publication History

Abstract

Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c ≥ 2.
In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a query-aware LSH function is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel query-aware LSH scheme named QALSH for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio c < 2, QALSH can achieve much better query quality.

References

[1]
A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, pages 1018--1028, 2014.
[2]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG, pages 253--262, 2004.
[3]
R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In ACM SIGMOD, pages 301--312, 2003.
[4]
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552, 2012.
[5]
A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518--529. VLDB Endowment, 1999.
[6]
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.
[7]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM STOC, pages 604--613, 1998.
[8]
H. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM TODS, 30(2):364--397, 2005.
[9]
W. Johnson and J. Lindenstrauss. Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics, 26:189--206, 1984.
[10]
J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In ACM STOC, pages 599--608, 1997.
[11]
Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen. Sk-lsh: An efficient index structure for approximate nearest neighbor search. VLDB, 7(9), 2014.
[12]
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In ACM-SIAM SODA, pages 1186--1195, 2006.
[13]
H. Samet. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006.
[14]
Y. Sun, W. Wang, J. Qin, Y. Zhang, and X. Lin. Srs: Solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. VLDB, 8(1), 2014.
[15]
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM TODS, 35(3):20, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 1
September 2015
35 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2015
Published in PVLDB Volume 9, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)15
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media