skip to main content
10.1145/1458082.1458172acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Modeling LSH for performance tuning

Published: 26 October 2008 Publication History

Abstract

Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained interesting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process.
To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.

References

[1]
http://www.cs.princeton.edu/cass.
[2]
M. Bawa, T. Condie, and P. Ganesan. Lsh forest: self-tuning indexes for similarity search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 651--660, New York, NY, USA, 2005. ACM.
[3]
J. L. Bentley. K-d trees for semidynamic point sets. In SCG '90: Proceedings of the sixth annual symposium on Computational geometry, pages 187--197, New York, NY, USA, 1990. ACM.
[4]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG '04: Proceedings of the twentieth annual symposium on Computational geometry, pages 253--262, New York, NY, USA, 2004. ACM Press.
[5]
Y. Deng and B. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(8):800--810, August 2001.
[6]
D. Dobkin and R. J. Lipton. Multidimensional searching problems. SIAM Journal on Computing, 5(2):181--186, 1976.
[7]
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren. DARPA TIMIT acoustic-phonetic continuous speech corpus, 1993.
[8]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB '99: Proceedings of the 25th International Conference on Very Large Data Bases, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[9]
A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD '84: Proceedings of the 1984 ACM SIGMOD international conference on Management of data, pages 47--57, New York, NY, USA, 1984. ACM.
[10]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC '98: Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604--613, New York, NY, USA, 1998. ACM.
[11]
N. Katayama and S. Satoh. The sr-tree: an index structure for high-dimensional nearest neighbor queries. In SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pages 369--380, New York, NY, USA, 1997. ACM.
[12]
M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3d shape descriptors. In SGP '03: Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pages 156--164, Aire-la-Ville, Switzerland, Switzerland, 2003. Eurographics Association.
[13]
Q. Lv, M. Charikar, and K. Li. Image similarity search with compact data structures. In CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 208--217, New York, NY, USA, 2004. ACM.
[14]
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Ferret: a toolkit for content-based similarity search of feature-rich data. In EuroSys '06: Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pages 317--330, New York, NY, USA, 2006. ACM.
[15]
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB '07: Proceedings of the 24rd International Conference on Very Large Data Bases, Vienna, Austria, 2007.
[16]
S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2):286--303, 1993.
[17]
B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In ICDE '00: Proceedings of the 16th International Conference on Data Engineering, page 589, Washington, DC, USA, 2000. IEEE Computer Society.
[18]
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pages 1186--1195, New York, NY, USA, 2006. ACM.
[19]
G. Tzanetakis and P. Cook. Marsyas: a framework for audio analysis. Organized Sound, 4(3):169--175, 1999.
[20]
Z. Wang, W. Dong, W. Josephson, Q. Lv, M. Charikar, and K. Li. Sizing sketches: a rank-based analysis for similarity search. In SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 157--168, New York, NY, USA, 2007. ACM.
[21]
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB '98: Proceedings of the 24rd International Conference on Very Large Data Bases, pages 194?205, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.

Cited By

View all

Index Terms

  1. Modeling LSH for performance tuning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. locality sensitive hashing
    2. similarity search

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media