skip to main content
research-article

Fingerprinting-based Minimal Perfect Hashing Revisited

Published: 26 June 2023 Publication History

Abstract

In this paper we study a fingerprint-based minimal perfect hash function (FMPH for short). While FMPH is not as space-efficient as some other minimal perfect hash functions (for example RecSplit, CHD, or PTHash), it has a number of practical advantages that make it worthy of consideration. FMPH is simple and quite fast to evaluate. Its construction requires very little auxiliary memory, takes a short time and, in addition, can be parallelized or carried out without holding keys in memory.
In this paper, we propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements. In addition, we experimentally study FMPHGO performance and find the best values for its parameters. Our benchmarks show that with our method and an efficient structure to support the rank queries on a bit vector, the FMPH size can be reduced to about 2.1 bits/key, which is close to the size achieved by state-of-the-art methods and noticeably larger only compared to RecSplit. FMPHGO preserves most of the FMPH advantages mentioned above, but significantly reduces its construction speed. However, FMPHGO’s construction speed is still competitive with methods of similar space efficiency (like CHD or PTHash), and seems to be good enough for practical applications.

References

[1]
Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger. 2009. Hash, displace, and compress. In Algorithms - ESA 2009, Amos Fiat and Peter Sanders (Eds.). Springer, Berlin, Berlin, 682–693.
[2]
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. 2004. UbiCrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34, 8 (2004), 711–726.
[3]
Fabiano C. Botelho, Yoshiharu Kohayakawa, and Nivio Ziviani. 2005. A practical minimal perfect hashing method. In Experimental and Efficient Algorithms, Sotiris E. Nikoletseas (Ed.). Springer, Berlin, Berlin, 488–500.
[4]
Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. 2007. Simple and space-efficient minimal perfect hash functions. In Algorithms and Data Structures, Frank Dehne, Jörg-Rüdiger Sack, and Norbert Zeh (Eds.). Springer, Berlin, Berlin, 139–150.
[5]
Jarrod A. Chapman, Isaac Ho, Sirisha Sunkara, Shujun Luo, Gary P. Schroth, and Daniel S. Rokhsar. 2011. Meraculous: De novo genome assembly with short paired-end reads. PLOS ONE 6, 8 (082011), 1–13. DOI:
[6]
Ulrich Drepper. 2007. What Every Programmer Should Know About Memory.
[7]
Emmanuel Esposito, Thomas Mueller Graf, and Sebastiano Vigna. RecSplit: Minimal Perfect Hashing via Recursive Splitting. 175–185. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611976007.14.
[8]
Edward A. Fox, Qi Fan Chen, and Lenwood S. Heath. 1992. A faster algorithm for constructing minimal perfect hash functions. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92). Association for Computing Machinery, New York, NY, USA, 266–273. DOI:
[9]
Michael L. Fredman and János. Komlós. 1984. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic Discrete Methods 5, 1 (1984), 61–68. DOI:
[10]
Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. 2020. Fast scalable construction of ([compressed] static | minimal perfect hash) functions. Information and Computation 273 (2020), 104517. DOI:DCC (Data Compression Conference) 2018.
[11]
Rodrigo González, Szymon Grabowski, Veli Mäkinen, and Gonzalo Navarro. 2005. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA’05) (Greece). 27–38.
[12]
Antoine Limasset, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. 2017. Fast and scalable minimal perfect hashing for massive key sets. In 16th International Symposium on Experimental Algorithms (SEA 2017) (Leibniz International Proceedings in Informatics (LIPIcs)), Vol. 75. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 25:1–25:16. DOI:
[13]
Ingo Müller, Peter Sanders, Robert Schulze, and Wei Zhou. 2014. Retrieval and perfect hashing using fingerprinting. In Experimental Algorithms, Joachim Gudmundsson and Jyrki Katajainen (Eds.). Springer International Publishing, Cham, 138–149.
[14]
Giulio Ermanno Pibiri and Roberto Trani. 2021. Parallel and external-memory construction of minimal perfect hash functions with PTHash. CoRR abs/2106.02350 (2021). arXiv:2106.02350 https://arxiv.org/abs/2106.02350.
[15]
Giulio Ermanno Pibiri and Roberto Trani. 2021. PTHash: Revisiting FCH Minimal Perfect Hashing. Association for Computing Machinery, New York, NY, USA, 1339–1348.
[16]
Jaikumar Radhakrishnan. 1992. Improved bounds for covering complete uniform hypergraphs. Inform. Process. Lett. 41, 4 (1992), 203–207. DOI:
[17]
Peter Schiffer, Michael Kerrisk, and Jan Chaloupka. 2021. memusage(1) Linux User’s Manual (5.13 ed.).
[18]
Yi Wang. [n. d.]. wyhash. https://github.com/wangyi-fudan/wyhash. [accessed 18 Jun. 2022].
[19]
Dong Zhou, David G. Andersen, and Michael Kaminsky. 2013. Space-efficient, high-performance rank and select structures on uncompressed bit sequences. In Experimental Algorithms, 12th International Symposium, SEA 2013, Rome, Italy, June 5–7, 2013. Proceedings (Lecture Notes in Computer Science), Vincenzo Bonifaci, Camil Demetrescu, and Alberto Marchetti-Spaccamela (Eds.), Vol. 7933. Springer, 151–163. DOI:

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 28, Issue
December 2023
325 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/3587923
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2023
Online AM: 13 May 2023
Accepted: 17 April 2023
Revised: 07 January 2023
Received: 14 July 2022
Published in JEA Volume 28

Author Tags

  1. Minimal perfect hashing
  2. data structures
  3. algorithms

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)158
  • Downloads (Last 6 weeks)10
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media