skip to main content
research-article

Optimal Las Vegas Approximate Near Neighbors in p

Published: 23 January 2022 Publication History

Abstract

We show that approximate near neighbor search in high dimensions can be solved in a Las Vegas fashion (i.e., without false negatives) for ℓp (1≤ p≤ 2) while matching the performance of optimal locality-sensitive hashing. Specifically, we construct a data-independent Las Vegas data structure with query time O(dnρ) and space usage O(dn1+ρ) for (r, c r)-approximate near neighbors in Rd under the ℓp norm, where ρ = 1/cp + o(1). Furthermore, we give a Las Vegas locality-sensitive filter construction for the unit sphere that can be used with the data-dependent data structure of Andoni et al. (SODA 2017) to achieve optimal space-time tradeoffs in the data-dependent setting. For the symmetric case, this gives us a data-dependent Las Vegas data structure with query time O(dnρ) and space usage O(dn1+ρ) for (r, c r)-approximate near neighbors in Rd under the ℓp norm, where ρ = 1/(2cp - 1) + o(1).
Our data-independent construction improves on the recent Las Vegas data structure of Ahle (FOCS 2017) for ℓp when 1 < p≤ 2. Our data-dependent construction performs even better for ℓp for all pε [1, 2] and is the first Las Vegas approximate near neighbors data structure to make use of data-dependent approaches. We also answer open questions of Indyk (SODA 2000), Pagh (SODA 2016), and Ahle by showing that for approximate near neighbors, Las Vegas data structures can match state-of-the-art Monte Carlo data structures in performance for both the data-independent and data-dependent settings and across space-time tradeoffs.

References

[1]
Thomas Dybdahl Ahle. 2017. It is NP-hard to Verify an LSF on the Sphere. Technical Report. thomasahle.com.
[2]
Thomas Dybdahl Ahle. 2017. Optimal Las Vegas locality sensitive data structures. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS’17). 938–949. DOI:
[3]
Nir Ailon and Bernard Chazelle. 2009. The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39, 1 (2009), 302–322. DOI:
[4]
Noga Alon and Shachar Lovett. 2013. Almost k-wise vs. k-wise independent permutations, and uniformity for general group actions. Theory Comput. 9 (2013), 559–577. DOI:
[5]
Noga Alon, Dana Moshkovitz, and Shmuel Safra. 2006. Algorithmic construction of sets for k-restrictions. ACM Trans. Algorithms 2, 2 (2006), 153–177. DOI:
[6]
Alexandr Andoni and Piotr Indyk. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06). 459–468. DOI:
[7]
Alexandr Andoni and Piotr Indyk. 2017. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry (3rd ed.), Jacob E. Goodman, Joseph O’Rourke, and Csaba D. Tóth (Eds.). CRC Press, Boca Raton, FL, 1135–1155.
[8]
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems 28. 1225–1233. http://papers.nips.cc/paper/5893-practical-and-optimal-lsh-for-angular-distance.
[9]
Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya P. Razenshteyn. 2014. Beyond locality-sensitive hashing. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 1018–1028. DOI:
[10]
Alexandr Andoni, Piotr Indyk, and Ilya P. Razenshteyn. 2018. Approximate nearest neighbor search in high dimensions. CoRR abs/1806.09823 (2018). arXiv:1806.09823 http://arxiv.org/abs/1806.09823.
[11]
Alexandr Andoni, Thijs Laarhoven, Ilya P. Razenshteyn, and Erik Waingarten. 2017. Optimal hashing-based time-space trade-offs for approximate near neighbors. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). 47–66. DOI:
[12]
Alexandr Andoni and Ilya P. Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the 27th Annual ACM on Symposium on Theory of Computing (STOC’15). 793–801. DOI:
[13]
Alexandr Andoni and Ilya P. Razenshteyn. 2016. Tight lower bounds for data-dependent locality-sensitive hashing. In Proceedings of the 32nd International Symposium on Computational Geometry (SoCG’16). Article 9, 11 pages. DOI:
[14]
Arvind Arasu, Venkatesh Ganti, and Raghav Kaushik. 2006. Efficient exact set-similarity joins. In Proceedings of the 32nd International Conference on Very Large Data Bases. 918–929. http://dl.acm.org/citation.cfm?id=1164206.
[15]
Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. 2016. New directions in nearest neighbor searching with applications to lattice sieving. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’16). 10–24. DOI:
[16]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. Comput. Netw. 29, 8–13 (1997), 1157–1166. DOI:
[17]
Moses Charikar, Kevin C. Chen, and Martin Farach-Colton. 2004. Finding frequent items in data streams. Theor. Comput. Sci. 312, 1 (2004), 3–15. DOI:
[18]
Tobias Christiani. 2017. A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). 31–46. DOI:
[19]
Tobias Christiani and Rasmus Pagh. 2017. Set similarity search beyond MinHash. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC’17). 1094–1107. DOI:
[20]
Kenneth L. Clarkson. 1988. A randomized algorithm for closest-point queries. SIAM J. Comput. 17, 4 (1988), 830–847. DOI:
[21]
Michael B. Cohen, T. S. Jayram, and Jelani Nelson. 2018. Simple analyses of the sparse johnson-lindenstrauss transform. In Proceedings of the 1st Symposium on Simplicity in Algorithms (SOSA’18). Article 15, 9 pages. DOI:
[22]
Michael B. Cohen, Jelani Nelson, and David P. Woodruff. 2016. Optimal approximate matrix product in terms of stable rank. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP’16). Article 11, 14 pages. DOI:
[23]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry. 253–262. DOI:
[24]
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In Proceedings of 25th International Conference on Very Large Data Bases (VLDB’99). 518–529. http://www.vldb.org/conf/1999/P49.pdf.
[25]
Mayank Goswami, Rasmus Pagh, Francesco Silvestri, and Johan Sivertsen. 2017. Distance sensitive Bloom filters without false negatives. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). 257–269. DOI:
[26]
Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory Comput. 8, 1 (2012), 321–350. DOI:
[27]
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 301 (1963), 13–30. http://www.jstor.org/stable/2282952.
[28]
Piotr Indyk. 2000. Dimensionality reduction techniques for proximity problems. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms371–378. http://dl.acm.org/citation.cfm?id=338219.338582.
[29]
William Johnson and Joram Lindenstrauss. 1984. Extensions of Lipschitz maps into a Hilbert space. Contemp. Math. 26 (1984), 189–206.
[30]
Matti Karppa, Petteri Kaski, Jukka Kohonen, and Padraig Ó Catháin. 2016. Explicit correlation amplifiers for finding outlier correlations in deterministic subquadratic time. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA’16). Article 52, 17 pages. DOI:
[31]
Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. 2000. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30, 2 (2000), 457–474. DOI:
[32]
Stefan Meiser. 1993. Point location in arrangements of hyperplanes. Inf. Comput. 106, 2 (1993), 286–303. DOI:
[33]
Rajeev Motwani, Assaf Naor, and Rina Panigrahy. 2007. Lower bounds on locality sensitive hashing. SIAM J. Discrete Math. 21, 4 (2007), 930–935. DOI:
[34]
Moni Naor, Leonard J. Schulman, and Aravind Srinivasan. 1995. Splitters and near-optimal derandomization. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science. 182–191. DOI:
[35]
Huy L. Nguyen. 2014. Algorithms for High Dimensional Data. Ph.D. Dissertation. Princeton University.
[36]
Ryan O’Donnell, Yi Wu, and Yuan Zhou. 2014. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theory 6, 1 (2014), Article 5, 13 pages. DOI:
[37]
Rasmus Pagh. 2016. Locality-sensitive hashing without false negatives. In Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’16). 1–9. DOI:
[38]
Ninh Pham and Rasmus Pagh. 2016. Scalability and total recall with fast CoveringLSH. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16). 1109–1118. DOI:
[39]
Piotr Sankowski and Piotr Wygocki. 2017. Approximate nearest neighbors search without false negatives for \(\ell _2\) for \(c\gt \sqrt {\log \log n}\). In Proceedings of the 28th International Symposium on Algorithms and Computation (ISAAC’17). Article 63, 12 pages. DOI:
[40]
Piotr Wygocki. 2017. Improved approximate near neighbor search without false negatives for \(\ell _2\). CoRR abs/1709.10338 (2017). arXiv:1709.10338 http://arxiv.org/abs/1709.10338.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms  Volume 18, Issue 1
January 2022
281 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/3492455
  • Editor:
  • Edith Cohen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2022
Accepted: 01 April 2021
Revised: 01 February 2021
Received: 01 March 2019
Published in TALG Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate near neighbor search
  2. locality-sensitive hashing
  3. similarity search
  4. recall

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Harvard PRISE Fellowship and a Herchel Smith Fellowship
  • Harvard PRISE Fellowship and a Herchel Smith Fellowship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 197
    Total Downloads
  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)4
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media