Document Open Access Logo

Approximate Cover of Strings

Authors Amihood Amir, Avivit Levy, Ronit Lubin, Ely Porat



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.26.pdf
  • Filesize: 477 kB
  • 14 pages

Document Identifiers

Author Details

Amihood Amir
Avivit Levy
Ronit Lubin
Ely Porat

Cite As Get BibTex

Amihood Amir, Avivit Levy, Ronit Lubin, and Ely Porat. Approximate Cover of Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.26

Abstract

Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP-hard, by studying the cover-size relaxation of the ACP, in which the requested size of the approximate cover is also given with the input string. We show this relaxation is already NP-hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors.

Subject Classification

Keywords
  • periodicity
  • quasi-periodicity
  • cover
  • approximate cover

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Estrella Eisenberg, and Avivit Levy. Approximate periodicity. In Otfried Cheong, Kyung-Yong Chwa, and Kunsoo Park, editors, Proceedings of the 21st International Symposium on Algorithms and Computation (ISAAC 2010), volume 6506 of LNCS, pages 25-36. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-17517-6_5.
  2. Amihood Amir, Costas S. Iliopoulos, and Jakub Radoszewski. Two strings at Hamming distance 1 cannot be both quasiperiodic, 2017. URL: http://arxiv.org/abs/1703.00195.
  3. Amihood Amir, Avivit Levy, Moshe Lewenstein, Ronit Lubin, and Benny Porat. Can we recover the cover? In Juha Kärkkäinen, Jakub Radoszewski, and Wojciech Rytter, editors, Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), volume 78 of LIPIcs, pages 25:1-25:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2017.25.
  4. Pavlos Antoniou, Maxime Crochemore, Costas S. Iliopoulos, Inuka Jayasekera, and Gad M. Landau. Conservative string covering of indeterminate strings. In Jan Holub and Jan Zdárek, editors, Proceedings of the Prague Stringology Conference (PSC 2008), pages 108-115. Czech Technical University in Prague, 2008. URL: http://www.stringology.org/event/2008/p10.html.
  5. Alberto Apostolico and Dany Breslauer. Of periods, quasiperiods, repetitions and covers. In Jan Mycielski, Grzegorz Rozenberg, and Arto Salomaa, editors, Structures in Logic and Computer Science: A Selection of Essays in Honor of Andrzej Ehrenfeucht, volume 1261 of LNCS, pages 236-248. Springer, 1997. URL: http://dx.doi.org/10.1007/3-540-63246-8_14.
  6. Alberto Apostolico and Andrzej Ehrenfeucht. Efficient detection of quasiperiodicities in strings. Theor. Comput. Sci., 119(2):247-265, 1993. URL: http://dx.doi.org/10.1016/0304-3975(93)90159-Q.
  7. Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. Optimal superprimitivity testing for strings. Inf. Process. Lett., 39(1):17-20, 1991. URL: http://dx.doi.org/10.1016/0020-0190(91)90056-N.
  8. Dany Breslauer. An on-line string superprimitivity test. Inf. Process. Lett., 44(6):345-347, 1992. URL: http://dx.doi.org/10.1016/0020-0190(92)90111-8.
  9. Dany Breslauer. Testing string superprimitivity in parallel. Inf. Process. Lett., 49(5):235-241, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)90060-4.
  10. Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, and Jeong Seop Sim. Approximate seeds of strings. J. Autom. Lang. Comb., 10(5/6):609-626, 2005. Google Scholar
  11. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition, chapter NP-Completeness, pages 966-1021. The MIT Press, 2001. Google Scholar
  12. Tim Crawford, Costas S. Iliopoulos, and Rajeev Raman. String-matching techniques for musical similarity and melodic recognition. In Walter B. Hewlett and Eleanor S. Field, editors, Melodic Similarity: Concepts, Procedures, and Applications, volume 11 of Computing in Musicology, pages 73-100. MIT Press, Cambridge, Massachusetts, 1998. Google Scholar
  13. Maxime Crochemore, Costas S. Iliopoulos, Solon P. Pissis, and German Tischler. Cover array string reconstruction. In Amihood Amir and Laxmi Parida, editors, Proceedings of the 21st Annual Symposium on Combinatorial Pattern Matching (CPM 2010), volume 6129 of LNCS, pages 251-259. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_23.
  14. Maxime Crochemore, Costas S. Iliopoulos, and Hiafeng Yu. Algorithms for computing evolutionary chains in molecular and musical sequences. In Costas S. Iliopoulos, editor, Proceedings of the 9th Australian Workshop on Combinatorial Algorithms (AWOCA 1998), pages 172-184, France, 1998. URL: https://hal-upec-upem.archives-ouvertes.fr/hal-00619988/.
  15. Tomás Flouri, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Simon J. Puglisi, William F. Smyth, and Wojciech Tyczyński. Enhanced string covering. Theor. Comput. Sci., 506:102-114, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2013.08.013.
  16. Ondřej Guth and Bořivoj Melichar. Using finite automata approach for searching approximate seeds of strings. In Xu Huang, Sio-Iong Ao, and Oscar Castillo, editors, Intelligent Automation and Computer Engineering, volume 52 of Lecture Notes in Electrical Engineering, pages 347-360. Springer Netherlands, 2010. URL: http://dx.doi.org/10.1007/978-90-481-3517-2_27.
  17. Ondřej Guth, Bořivoj Melichar, and Miroslav Balík. Searching all approximate covers and their distance using finite automata. In Peter Vojtáš, editor, Proceedings of the Conference on Theory and Practice of Information Technologies (ITAT 2008), volume 414 of CEUR Workshop Proceedings, pages 21-26, 2009. URL: http://ceur-ws.org/Vol-414/paper4.pdf.
  18. Costas S. Iliopoulos, Dennis W. G. Moore, and Kunsoo Park. Covering a string. Algorithmica, 16(3):288-297, 1996. URL: http://dx.doi.org/10.1007/BF01955677.
  19. Costas S. Iliopoulos and Laurent Mouchard. Quasiperiodicity and string covering. Theor. Comput. Sci., 218(1):205-216, 1999. URL: http://dx.doi.org/10.1016/S0304-3975(98)00260-6.
  20. Costas S. Iliopoulos and William F. Smyth. An on-line algorithm of computing a minimum set of k-covers of a string. In Costas S. Iliopoulos, editor, Proceedings of the 9th Australian Workshop on Combinatorial Algorithms (AWOCA 1998), 1998. Google Scholar
  21. Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Fast algorithm for partial covers in words. Algorithmica, 73(1):217-233, 2015. URL: http://dx.doi.org/10.1007/s00453-014-9915-3.
  22. Roman M. Kolpakov and Gregory Kucherov. Finding approximate repetitions under Hamming distance. Theor. Comput. Sci., 1(303):135-156, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00448-6.
  23. Gad M. Landau and Jeanette P. Schmidt. An algorithm for approximate tandem repeats. In Alberto Apostolico, Maxime Crochemore, Zvi Galil, and Udi Manber, editors, Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM 1993), volume 684 of LNCS, pages 120-133. Springer, 1993. URL: http://dx.doi.org/10.1007/BFb0029801.
  24. Gad M. Landau, Jeanette P. Schmidt, and Dina Sokol. An algorithm for approximate tandem repeats. J. Comput. Biol., 8(1):1-18, 2001. URL: http://dx.doi.org/10.1089/106652701300099038.
  25. Yin Li and William F. Smyth. Computing the cover array in linear time. Algorithmica, 32(1):95-106, 2002. URL: http://dx.doi.org/10.1007/s00453-001-0062-2.
  26. M. Lothaire, editor. Combinatorics on words. Cambridge Mathematical Library. Cambridge University Press, 1997. URL: http://dx.doi.org/10.1017/CBO9780511566097.
  27. Dennis Moore and William F. Smyth. An optimal algorithm to compute all the covers of a string. Inf. Process. Lett., 50(5):239-246, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)00045-X.
  28. Dennis Moore and William F. Smyth. A correction to "An optimal algorithm to compute all the covers of a string". Inf. Process. Lett., 54(2):101-103, 1995. URL: http://dx.doi.org/10.1016/0020-0190(94)00235-Q.
  29. Jeong Seop Sim, Costas S. Iliopoulos, Kunsoo Park, and William F. Smyth. Approximate periods of strings. Theor. Comput. Sci., 262(1):557-568, 2001. URL: http://dx.doi.org/10.1016/S0304-3975(00)00365-0.
  30. William F. Smyth. Repetitive perhaps, but certainly not boring. Theor. Comput. Sci., 249(2):343-355, 2000. URL: http://dx.doi.org/10.1016/S0304-3975(00)00067-0.
  31. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Algorithms for computing the lambda-regularities in strings. Fundam. Inform., 84(1):33-49, 2008. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi84-1-04.
  32. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Varieties of regularities in weighted sequences. In Bo Chen, editor, Proceedings of the 6th International Conference on Algorithmic Aspects in Information and Management (AAIM 2010), volume 6124 of LNCS, pages 271-280. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-14355-7_28.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail