skip to main content
10.1145/3519935.3520061acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Dynamic suffix array with polylogarithmic queries and updates

Published: 10 June 2022 Publication History

Abstract

The suffix array SA[1.n] of a text T of length n is a permutation of {1, …, n} describing the lexicographical ordering of suffixes of T and is considered to be one of the most important data structures for string processing, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value SA[i] given any i ∈ [1.n].
Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh, who showed how to answer suffix array queries in Õ(k) time, where k ∈ [1.n] is a trade-off parameter, with Õ(n/k)-time text updates [ISAAC 2020]. In a very recent preprint, they also provided a solution with O(log5 n)-time queries and Õ(n2/3)-time updates [arXiv 2021].
We propose the first data structure that supports both suffix array queries and text updates in O(polylog n) time (achieving O(log4 n) and O(log3+o(1) n) time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in O(log3+o(1) n) time) the ”cut-paste” operation that moves any (arbitrarily long) substring of T to any place in T. To achieve our result, we develop a number of new techniques which are of independent interest. This includes a new flavor of dynamic locally consistent parsing, as well as a dynamic construction of string synchronizing sets with an extra local sparsity property; this significantly generalizes the sampling technique introduced at STOC 2019. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with O(polylog n)-time suffix array queries can support the ”copy-paste” operation in O(n1−є) time for any є > 0.

References

[1]
Donald Adjeroh, Tim Bell, and Amar Mukherjee. 2008. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer, Boston, MA, USA. isbn:978-0-387-78909-5 https://doi.org/10.1007/978-0-387-78909-5
[2]
Shyan Akmal and Ce Jin. 2022. Near-Optimal Quantum Algorithms for String Problems. In Proc. SODA. 2791–2832. https://doi.org/10.1137/1.9781611977073.109
[3]
Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. 2000. Pattern matching in dynamic texts. In Proc. SODA. 819–828. http://dl.acm.org/citation.cfm?id=338219.338645
[4]
Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszynski, Tomasz Waleń, and Wiktor Zuba. 2019. Quasi-Linear-Time Algorithm for Longest Common Circular Factor. In Proc. CPM. 25:1–25:14. https://doi.org/10.4230/LIPIcs.CPM.2019.25
[5]
Amihood Amir and Itai Boneh. 2020. Update Query Time Trade-Off for Dynamic Suffix Arrays. In Proc. ISAAC. 63:1–63:16. https://doi.org/10.4230/LIPIcs.ISAAC.2020.63
[6]
Amihood Amir and Itai Boneh. 2021. Dynamic Suffix Array with Sub-linear update time and Poly-logarithmic Lookup Time. arxiv:2112.12678.
[7]
Diego Arroyuelo, Gonzalo Navarro, and Kunihiko Sadakane. 2012. Stronger Lempel-Ziv Based Compressed Text Indexing. Algorithmica, 62, 1-2 (2012), 54–101. https://doi.org/10.1007/s00453-010-9443-8
[8]
Djamal Belazzougui, Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Alberto Ordóñez Pereira, Simon J. Puglisi, and Yasuo Tabei. 2015. Queries on LZ-bounded encodings. In Proc. DCC. 83–92. https://doi.org/10.1109/DCC.2015.69
[9]
Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj. 2018. Time-space trade-offs for Lempel-Ziv compressed indexing. Theor. Comput. Sci., 713 (2018), 66–77. https://doi.org/10.1016/j.tcs.2017.12.021
[10]
Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. 2015. Random access to grammar-compressed strings and trees. SIAM J. Comput., 44, 3 (2015), 513–539. https://doi.org/10.1137/130936889
[11]
Or Birenzwige, Shay Golan, and Ely Porat. 2020. Locally Consistent Parsing for Text Indexing in Small Space. In Proc. SODA. 607–626. https://doi.org/10.1137/1.9781611975994.37
[12]
Michael Burrows and David J. Wheeler. 1994. A block-sorting lossless data compression algorithm. Digital Equipment Corporation, Palo Alto, California. http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf
[13]
Ho-Leung Chan, Wing-Kai Hon, Tak Wah Lam, and Kunihiko Sadakane. 2007. Compressed indexes for dynamic text collections. ACM Trans. Algorithms, 3, 2 (2007), 21. https://doi.org/10.1145/1240233.1240244
[14]
Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. 2021. Faster Algorithms for Longest Common Substring. In Proc. ESA. 30:1–30:17. https://doi.org/10.4230/LIPIcs.ESA.2021.30
[15]
Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. 2020. Faster Approximate Pattern Matching: A Unified Approach. In Proc. FOCS. 978–989. https://doi.org/10.1109/FOCS46700.2020.00095
[16]
Bernard Chazelle. 1988. A Functional Approach to Data Structures and Its Use in Multidimensional Searching. SIAM J. Comput., 17, 3 (1988), 427–462. https://doi.org/10.1137/0217026
[17]
Yu-Feng Chien, Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. 2015. Geometric BWT: Compressed Text Indexing via Sparse Suffixes and Range Searching. Algorithmica, 71, 2 (2015), 258–278. https://doi.org/10.1007/s00453-013-9792-1
[18]
Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. 2021. Optimal-Time Dictionary-Compressed Indexes. ACM Trans. Algorithms, 17, 1 (2021), 8:1–8:39. https://doi.org/10.1145/3426473
[19]
Francisco Claude and Gonzalo Navarro. 2011. Self-Indexed Grammar-Based Compression. Fundam. Informaticae, 111, 3 (2011), 313–337. https://doi.org/10.3233/FI-2011-565
[20]
Francisco Claude, Gonzalo Navarro, and Alejandro Pacheco. 2021. Grammar-compressed indexes with logarithmic search time. J. Comput. Syst. Sci., 118 (2021), 53–74. https://doi.org/10.1016/j.jcss.2020.12.001
[21]
Richard Cole and Uzi Vishkin. 1986. Deterministic Coin Tossing with Applications to Optimal Parallel List Ranking. Inf. Control., 70, 1 (1986), 32–53. https://doi.org/10.1016/S0019-9958(86)80023-7
[22]
Paul F. Dietz and Daniel Dominic Sleator. 1987. Two Algorithms for Maintaining Order in a List. In Proc. STOC. 365–372. https://doi.org/10.1145/28395.28434
[23]
Andrzej Ehrenfeucht, Ross M. McConnell, Nissa Osheim, and Sung-Whan Woo. 2011. Position heaps: A simple and dynamic text indexing data structure. J. Discrete Algorithms, 9, 1 (2011), 100–121. https://doi.org/10.1016/j.jda.2010.12.001
[24]
Paolo Ferragina and Giovanni Manzini. 2005. Indexing compressed text. J. ACM, 52, 4 (2005), 552–581. https://doi.org/10.1145/1082036.1082039
[25]
Johannes Fischer and Paweł Gawrychowski. 2015. Alphabet-Dependent String Searching with Wexponential Search Trees. In Proc. CPM. 160–171. https://doi.org/10.1007/978-3-319-19929-0_14
[26]
Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. 2012. A Faster Grammar-Based Self-index. In Proc. LATA. 240–251. https://doi.org/10.1007/978-3-642-28332-1_21
[27]
Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Yakov Nekrich, and Simon J. Puglisi. 2014. LZ77-Based Self-indexing with Faster Pattern Matching. In Proc. LATIN. 731–742. https://doi.org/10.1007/978-3-642-54423-1_63
[28]
Travis Gagie, Gonzalo Navarro, and Nicola Prezza. 2020. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. J. ACM, 67, 1 (2020), apr, 1–54. issn:0004-5411 https://doi.org/10.1145/3375890
[29]
Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, and Piotr Sankowski. 2015. Optimal Dynamic Strings. arxiv:1511.02612.
[30]
Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, and Piotr Sankowski. 2018. Optimal Dynamic Strings. In Proc. SODA. 1509–1528. https://doi.org/10.1137/1.9781611975031.99
[31]
Rodrigo González and Gonzalo Navarro. 2009. Rank/select on dynamic compressed sequences and applications. Theor. Comput. Sci., 410, 43 (2009), 4414–4422. https://doi.org/10.1016/j.tcs.2009.07.022
[32]
Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput., 35, 2 (2005), 378–407. https://doi.org/10.1137/S0097539702402354
[33]
Ming Gu, Martin Farach, and Richard Beigel. 1994. An Efficient Algorithm for Dynamic Text Indexing. In Proc. SODA. 697–704. http://dl.acm.org/citation.cfm?id=314464.314675
[34]
Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press. isbn:0-521-58519-8 https://doi.org/10.1017/cbo9780511574931
[35]
Torben Hagerup. 1998. Sorting and Searching on the Word RAM. In Proc. STACS. 366–398. https://doi.org/10.1007/BFb0028575
[36]
Meng He and J. Ian Munro. 2010. Succinct Representations of Dynamic Strings. In Proc. SPIRE. 334–346. https://doi.org/10.1007/978-3-642-16321-0_35
[37]
Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. 2015. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In Proc. STOC. 21–30. https://doi.org/10.1145/2746539.2746609
[38]
Juha Kärkkäinen. 1999. Repetition-based Text Indexes. Ph. D. Dissertation. University of Helsinki. https://helda.helsinki.fi/bitstream/handle/10138/21348/repetiti.pdf
[39]
Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. 2001. Linear-Time Longest-Common-Prefix computation in suffix arrays and its applications. In Proc. CPM. 181–192. https://doi.org/10.1007/3-540-48194-X_17
[40]
Dominik Kempa and Tomasz Kociumaka. 2019. String synchronizing sets: Sublinear-time BWT construction and optimal LCE data structure. In Proc. STOC. 756–767. https://doi.org/10.1145/3313276.3316368
[41]
Dominik Kempa and Tomasz Kociumaka. 2020. Resolution of the Burrows-Wheeler Transform Conjecture. In Proc. FOCS. 1002–1013. https://doi.org/10.1109/FOCS46700.2020.00097
[42]
Dominik Kempa and Tomasz Kociumaka. 2021. Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays. arxiv:2106.12725.
[43]
Dominik Kempa and Tomasz Kociumaka. 2022. Dynamic Suffix Array with Polylogarithmic Queries and Updates. arXiv:2201.01285.
[44]
Tsvi Kopelowitz. 2012. On-Line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List. In Proc. FOCS. 283–292. https://doi.org/10.1109/FOCS.2012.79
[45]
Sebastian Kreft and Gonzalo Navarro. 2013. On compressing and indexing repetitive sequences. Theor. Comput. Sci., 483 (2013), 115–133. https://doi.org/10.1016/j.tcs.2012.02.006
[46]
George S. Lueker and Dan E. Willard. 1982. A Data Structure for Dynamic Range Queries. Inf. Process. Lett., 15, 5 (1982), 209–213. https://doi.org/10.1016/0020-0190(82)90119-3
[47]
Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. 2015. Genome-scale algorithm design: Biological sequence analysis in the era of high-throughput sequencing. Cambridge University Press, Cambridge, UK. isbn:9781107078536 https://doi.org/10.1017/cbo9781139940023
[48]
Veli Mäkinen and Gonzalo Navarro. 2008. Dynamic entropy-compressed sequences and full-text indexes. ACM Trans. Algorithms, 4, 3 (2008), 32:1–32:38. https://doi.org/10.1145/1367064.1367072
[49]
Udi Manber and Eugene W. Myers. 1993. Suffix Arrays: A new method for on-line string searches. SIAM J. Comput., 22, 5 (1993), 935–948. https://doi.org/10.1137/0222058
[50]
Shirou Maruyama, Masaya Nakahara, Naoya Kishiue, and Hiroshi Sakamoto. 2013. ESP-index: A compressed index based on edit-sensitive parsing. J. Discrete Algorithms, 18 (2013), 100–112. https://doi.org/10.1016/j.jda.2012.07.009
[51]
Kurt Mehlhorn, R. Sundar, and Christian Uhrig. 1997. Maintaining Dynamic Sequences under Equality Tests in Polylogarithmic Time. Algorithmica, 17, 2 (1997), 183–198. https://doi.org/10.1007/BF02522825
[52]
J. Ian Munro, Gonzalo Navarro, and Yakov Nekrich. 2020. Text Indexing and Searching in Sublinear Time. In Proc. CPM. 24:1–24:15. https://doi.org/10.4230/LIPIcs.CPM.2020.24
[53]
Gonzalo Navarro. 2016. Compact data structures: A practical approach. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/cbo9781316588284
[54]
Gonzalo Navarro and Veli Mäkinen. 2007. Compressed full-text indexes. ACM Comput. Surv., 39, 1 (2007), 2. https://doi.org/10.1145/1216370.1216372
[55]
Gonzalo Navarro and Yakov Nekrich. 2014. Optimal Dynamic Sequence Representations. SIAM J. Comput., 43, 5 (2014), 1781–1806. https://doi.org/10.1137/130908245
[56]
Gonzalo Navarro and Kunihiko Sadakane. 2014. Fully Functional Static and Dynamic Succinct Trees. ACM Trans. Algorithms, 10, 3 (2014), 16:1–16:39. https://doi.org/10.1145/2601073
[57]
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. 2016. Fully Dynamic Data Structure for LCE Queries in Compressed Space. In Proc. MFCS. 72:1–72:15. https://doi.org/10.4230/LIPIcs.MFCS.2016.72
[58]
Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. 2020. Dynamic index and LZ factorization in compressed space. Discret. Appl. Math., 274 (2020), 116–129. https://doi.org/10.1016/j.dam.2019.01.014
[59]
Enno Ohlebusch. 2013. Bioinformatics algorithms: Sequence analysis, genome rearrangements, and phylogenetic reconstruction. Oldenbusch Verlag, Ulm, Germany. isbn:978-3000413162
[60]
Luís M. S. Russo, Gonzalo Navarro, and Arlindo L. Oliveira. 2008. Dynamic Fully-Compressed Suffix Trees. In Proc. CPM. 191–203. https://doi.org/10.1007/978-3-540-69068-9_19
[61]
Süleyman Cenk Sahinalp and Uzi Vishkin. 1994. Symmetry breaking for suffix tree construction. In Proc. STOC. 300–309. https://doi.org/10.1145/195058.195164
[62]
Süleyman Cenk Sahinalp and Uzi Vishkin. 1996. Efficient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm (extended abstract). In Proc. FOCS. 320–328. https://doi.org/10.1109/SFCS.1996.548491
[63]
Mikaël Salson, Thierry Lecroq, Martine Léonard, and Laurent Mouchard. 2010. Dynamic extended suffix arrays. J. Discrete Algorithms, 8, 2 (2010), 241–257. https://doi.org/10.1016/j.jda.2009.02.007
[64]
Yoshimasa Takabatake, Yasuo Tabei, and Hiroshi Sakamoto. 2014. Improved ESP-index: A Practical Self-index for Highly Repetitive Texts. In Proc. SEA. 338–350. https://doi.org/10.1007/978-3-319-07959-2_29
[65]
Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. 2020. Grammar-compressed Self-index with Lyndon Words. arxiv:2004.05309.
[66]
Dan E. Willard and George S. Lueker. 1985. Adding Range Restriction Capability to Dynamic Data Structures. J. ACM, 32, 3 (1985), 597–617. https://doi.org/10.1145/3828.3839

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
June 2022
1698 pages
ISBN:9781450392648
DOI:10.1145/3519935
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Suffix array
  2. dynamic data structures
  3. pattern matching
  4. string synchronizing sets
  5. text indexing

Qualifiers

  • Research-article

Conference

STOC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)5
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media