skip to main content
10.5555/3039686.3039703acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
research-article

Distance sensitive bloom filters without false negatives

Published: 16 January 2017 Publication History

Abstract

A Bloom filter is a widely used data-structure for representing a set S and answering queries of the form "Is x in S?". By allowing some false positive answers (saying 'yes' when the answer is in fact 'no') Bloom filters use space significantly below what is required for storing S. In the distance sensitive setting we work with a set S of (Hamming) vectors and seek a data structure that offers a similar trade-off, but answers queries of the form "Is x close to an element of S?" (in Hamming distance). Previous work on distance sensitive Bloom filters have accepted false positive and false negative answers. Absence of false negatives is of critical importance in many applications of Bloom filters, so it is natural to ask if this can be also achieved in the distance sensitive setting. Our main contributions are upper and lower bounds (that are tight in several cases) for space usage in the distance sensitive setting where false negatives are not allowed.

References

[1]
Milton Abramowitz. Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables. Dover Publications, 1974.
[2]
Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. 47th ACM Symposium on Theory of Computing (STOC), pages 793--801, 2015.
[3]
Andrew C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates. Transactions of the American Mathematical Society, 49(1):122--136, 1941.
[4]
Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, jul 1970.
[5]
Béla Bollobás. Combinatorics: Set Systems, Hypergraphs, Families of Vectors, and Combinatorial Probability. Cambridge University Press, New York, NY, USA, 1986.
[6]
Andrei Broder and Michael Mitzenmacher. Network applications of Bloom filters: a survey. Internet mathematics, 1(4):485--509, 2004.
[7]
Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. Proc. 10th ACM Symposium on Theory of Computing (STOC), pages 59--65, 1978.
[8]
Moses Charikar, Kevin C. Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3--15, 2004.
[9]
John Cook. Upper and lower bounds for the normal distribution. Unpublished manuscript, http://www.johndcook.com/normalbounds.pdf, 2009.
[10]
Devdatt Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009.
[11]
Ronald L. Graham and Neil J. A. Sloane. Lower bounds for constant weight codes. IEEE Transaction on Information Theory, 1980.
[12]
Bin Hua, Yu abd Xiao, Bharadwaj Veeravalli, and Dan Feng. Locality-sensitive Bloom filter for approximate membership query. IEEE Transactions on Computers, 61(6):817--830, 2012.
[13]
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. Proc. 30th ACM Symposium on Theory of Computing (STOC), 8:321--350, 1998.
[14]
Adam Kirsch and Michael Mitzenmacher. Distancesensitive Bloom filters. Proc. 8th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 41--50, 2006.
[15]
Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997.
[16]
David J. C. MacKay. Information Theory, Inference & Learning Algorithms. Cambridge University Press, 2002.
[17]
Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
[18]
Rasmus Pagh. Locality-sensitive hashing without false negatives. In Proc. 27th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1--9. SIAM, 2016.
[19]
Rasmus Pagh and Flemming Friche Rodler. Lossy dictionaries. In Proc. 9th European Symposium on Algorithms (ESA), pages 300--311, 2001.
[20]
Irina S. Tyurin. On the absolute constants in the Berry-Esseen inequality and its structural and nonuniform improvements. Inform. Primen., 7:124--125, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SODA '17: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms
January 2017
2756 pages

Sponsors

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 16 January 2017

Check for updates

Qualifiers

  • Research-article

Conference

SODA '17
Sponsor:
SODA '17: Symposium on Discrete Algorithms
January 16 - 19, 2017
Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media