Document Open Access Logo

Computing All Distinct Squares in Linear Time for Integer Alphabets

Authors Hideo Bannai, Shunsuke Inenaga, Dominik Köppl



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.22.pdf
  • Filesize: 0.72 MB
  • 18 pages

Document Identifiers

Author Details

Hideo Bannai
Shunsuke Inenaga
Dominik Köppl

Cite As Get BibTex

Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing All Distinct Squares in Linear Time for Integer Alphabets. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.22

Abstract

Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.

Subject Classification

Keywords
  • tandem repeats
  • distinct squares
  • counting algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alberto Apostolico and Franco P. Preparata. Data structures and algorithms for the string statistics problem. Algorithmica, 15(5):481-494, 1996. URL: http://dx.doi.org/10.1007/BF01955046.
  2. Hideo Bannai, Shunsuke Inenaga, and Dominik Köppl. Computing all distinct squares in linear time for integer alphabets, 2016. URL: http://arxiv.org/abs/1610.03421.
  3. Paul Beame and Faith E. Fich. Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci., 65(1):38-72, 2002. URL: http://dx.doi.org/10.1006/jcss.2002.1822.
  4. Anselm Blumer, Janet A. Blumer, David Haussler, Andrzej Ehrenfeucht, M. T. Chen, and Joel I. Seiferas. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci., 40:31-55, 1985. URL: http://dx.doi.org/10.1016/0304-3975(85)90157-4.
  5. Gerth Stølting Brodal, Rune B. Lyngsø, Anna Östlin, and Christian N. S. Pedersen. Solving the string statistics problem in time O(n log n). In Peter Widmayer, Francisco Triguero Ruiz, Rafael Morales Bueno, Matthew Hennessy, Stephan Eidenbenz, and Ricardo Conejo, editors, Proceedings of the 29th International Colloquium on Automata, Languages, and Programming (ICALP 2002), volume 2380 of LNCS, pages 728-739. Springer, 2002. URL: http://dx.doi.org/10.1007/3-540-45465-9_62.
  6. David R. Clark. Compact Pat Trees. PhD thesis, University of Waterloo, Canada, 1996. URL: http://hdl.handle.net/10012/64.
  7. Richard Cole and Ramesh Hariharan. Dynamic LCA queries on trees. SIAM J. Comput., 34(4), 2005. URL: http://dx.doi.org/10.1137/S0097539700370539.
  8. Maxime Crochemore and Lucian Ilie. Computing longest previous factor in linear time and applications. Inf. Process. Lett., 106(2):75-80, 2008. URL: http://dx.doi.org/10.1016/j.ipl.2007.10.006.
  9. Maxime Crochemore, Lucian Ilie, Costas S. Iliopoulos, Marcin Kubica, Wojciech Rytter, and Tomasz Waleń. LPF computation revisited. In Jirí Fiala, Jan Kratochvíl, and Mirka Miller, editors, Proceedings of the 20th International Workshop on Combinatorial Algorithms (IWOCA 2009), volume 5874 of LNCS, pages 158-169. Springer, 2009. URL: http://dx.doi.org/10.1007/978-3-642-10217-2_18.
  10. Antoine Deza, Frantisek Franek, and Adrien Thierry. How many double squares can a string contain? Discrete Appl. Math., 180:52-69, 2015. URL: http://dx.doi.org/10.1016/j.dam.2014.08.016.
  11. Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987-1011, 2000. URL: http://dx.doi.org/10.1145/355541.355547.
  12. Paolo Ferragina and Gonzalo Navarro. The Pizza & Chili Corpus. Available at http://pizzachili.di.unipi.it and http://pizzachili.dcc.uchile.cl, 2005.
  13. Johannes Fischer. Wee LCP. Inf. Process. Lett., 110(8-9):317-320, 2010. URL: http://dx.doi.org/10.1016/j.ipl.2010.02.010.
  14. Johannes Fischer. Inducing the LCP-array. In Frank Dehne, John Iacono, and Jörg-Rüdiger Sack, editors, Proceedings of the 12th International Symposium on Algorithms and Data Structures (WADS 2011), volume 6844 of LNCS, pages 374-385. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-22300-6_32.
  15. Johannes Fischer and Volker Heun. Space efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput., 40(2):465-492, 2011. URL: http://dx.doi.org/10.1137/090779759.
  16. Johannes Fischer, Tomohiro I, and Dominik Köppl. Lempel-Ziv computation in small space (LZ-CISS). In Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro, editors, Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015), volume 9133 of LNCS, pages 172-184. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_15.
  17. Aviezri S. Fraenkel and Jamie Simpson. How many squares can a string contain? J. Comb. Theory, Ser. A, 82(1):112-120, 1998. URL: http://dx.doi.org/10.1006/jcta.1997.2843.
  18. Frantisek Franek, Jan Holub, William F. Smyth, and Xiangdong Xiao. Computing quasi suffix arrays. J. Autom. Lang. Comb., 8(4):593-606, 2003. Google Scholar
  19. Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci., 30(2):209-221, 1985. URL: http://dx.doi.org/10.1016/0022-0000(85)90014-5.
  20. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In Joachim Gudmundsson and Jyrki Katajainen, editors, Proceedings of the 13th International Symposium on Experimental Algorithms (SEA 2014), volume 8504 of LNCS, pages 326-337. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07959-2_28.
  21. Keisuke Goto. Optimal time and space construction of suffix arrays and LCP arrays for integer alphabets, 2017. URL: http://arxiv.org/abs/1703.01009.
  22. Dan Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. URL: http://dx.doi.org/10.1017/CBO9780511574931.
  23. Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci., 69(4):525-546, 2004. URL: http://dx.doi.org/10.1016/j.jcss.2004.03.004.
  24. Wing-Kai Hon and Kunihiko Sadakane. Space-economical algorithms for finding maximal unique matches. In Alberto Apostolico and Masayuki Takeda, editors, Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM 2002), volume 2373 of LNCS, pages 144-152. Springer, 2002. URL: http://dx.doi.org/10.1007/3-540-45452-7_13.
  25. Lucas Chi Kwong Hui. Color set size problem with application to string matching. In Alberto Apostolico, Maxime Crochemore, Zvi Galil, and Udi Manber, editors, Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching (CPM 1992), volume 644 of LNCS, pages 230-243. Springer, 1992. URL: http://dx.doi.org/10.1007/3-540-56024-6_19.
  26. Lucian Ilie. A note on the number of squares in a word. Theor. Comput. Sci., 380(3):373-376, 2007. URL: http://dx.doi.org/10.1016/j.tcs.2007.03.025.
  27. Natasa Jonoska, Florin Manea, and Shinnosuke Seki. A stronger square conjecture on binary words. In Viliam Geffert, Bart Preneel, Branislav Rovan, Julius Stuller, and A Min Tjoa, editors, Proceedings of the 40th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2014), volume 8327 of LNCS, pages 339-350. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-04298-5_30.
  28. Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Linear time Lempel-Ziv factorization: Simple, fast, small. In Johannes Fischer and Peter Sanders, editors, Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), volume 7922 of LNCS, pages 189-200. Springer, 2013. URL: http://dx.doi.org/10.1007/978-3-642-38905-4_19.
  29. Juha Kärkkäinen, Giovanni Manzini, and Simon John Puglisi. Permuted longest-common-prefix array. In Gregory Kucherov and Esko Ukkonen, editors, Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching (CPM 2009), volume 5577 of LNCS, pages 181-192. Springer, 2009. URL: http://dx.doi.org/10.1007/978-3-642-02441-2_17.
  30. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006. URL: http://dx.doi.org/10.1145/1217856.1217858.
  31. Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Amihood Amir and Gad M. Landau, editors, Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), volume 2089 of LNCS, pages 181-192. Springer, 2001. URL: http://dx.doi.org/10.1007/3-540-48194-X_17.
  32. Pang Ko and Srinivas Aluru. Space efficient linear time construction of suffix arrays. J. Discrete Algorithms, 3(2-4):143-156, 2005. URL: http://dx.doi.org/10.1016/j.jda.2004.08.002.
  33. Dominik Köppl. Computing all distinct squares efficiently, 2017. URL: https://github.com/koeppl/distinct_squares.
  34. Dominik Köppl and Kunihiko Sadakane. Lempel-Ziv computation in compressed space (LZ-CICS). In Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà, and James A. Storer, editors, Proceedings of the 2016 Data Compression Conference (DCC 2016), pages 3-12. IEEE Computer Society, 2016. URL: http://dx.doi.org/10.1109/DCC.2016.38.
  35. Zhize Li, Jian Li, and Hongwei Huo. Optimal in-place suffix sorting, 2016. URL: http://arxiv.org/abs/1610.08305.
  36. Udi Manber and Eugene W. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935-948, 1993. URL: http://dx.doi.org/10.1137/0222058.
  37. Florin Manea and Shinnosuke Seki. Square-density increasing mappings. In Florin Manea and Dirk Nowotka, editors, Proceedings of the 10th International Conference on Combinatorics on Words (WORDS 2015), volume 9304 of LNCS, pages 160-169. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-23660-5_14.
  38. Yuta Mori. libdivsufsort, 2015. URL: https://github.com/y-256/libdivsufsort.
  39. J. Ian Munro, Gonzalo Navarro, and Yakov Nekrich. Space-efficient construction of compressed indexes in deterministic linear time. In Philip N. Klein, editor, Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2017), pages 408-424. SIAM, 2017. URL: http://dx.doi.org/10.1137/1.9781611974782.26.
  40. Enno Ohlebusch and Simon Gog. Lempel-Ziv factorization revisited. In Raffaele Giancarlo and Giovanni Manzini, editors, Proceedings of the 22nd Annual Symposium on Combinatorial Pattern Matching (CPM 2011), volume 6661 of LNCS, pages 15-26. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-21458-5_4.
  41. Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory Comput. Syst., 41(4):589-607, 2007. URL: http://dx.doi.org/10.1007/s00224-006-1198-x.
  42. Yohei Ueki, Diptarama, Masatoshi Kurihara, Yoshiaki Matsuoka, Kazuyuki Narisawa, Ryo Yoshinaka, Hideo Bannai, Shunsuke Inenaga, and Ayumi Shinohara. Longest common subsequence in at least k length order-isomorphic substrings. In Bernhard Steffen, Christel Baier, Mark van den Brand, Johann Eder, Mike Hinchey, and Tiziana Margaria, editors, Proceedings of the 43rd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2017), volume 10139 of LNCS, pages 363-374. Springer, 2017. URL: http://dx.doi.org/10.1007/978-3-319-51963-0_28.
  43. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. URL: http://dx.doi.org/10.1007/BF01206331.
  44. Peter Weiner. Linear pattern matching algorithms. In H. Raymond Strong, editor, Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), pages 1-11. IEEE Computer Society, 1973. URL: http://dx.doi.org/10.1109/SWAT.1973.13.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail