skip to main content
research-article

Optimal Dynamic Sequence Representations

Published: 01 January 2014 Publication History

Abstract

We describe a data structure that supports access, rank, and select queries, as well as symbol insertions and deletions, on a string S[1,n] over alphabet $[1.\sigma]$ in time $O(\log n/\log\log n)$, which is optimal even on binary sequences and in the amortized sense. Our time is worst case for the queries and amortized for the updates. This complexity is better than the best previous ones by a $\Theta(1+\log\sigma/\log\log n)$ factor. We also design a variant where times are worst case, yet rank and updates take $O(\log n)$ time. Our structure uses $nH_0(S)+o(n\log\sigma) + O(\sigma\log n)$ bits, where $H_0(S)$ is the zero-order entropy of $S$. Finally, we pursue various extensions and applications of the result.

References

[1]
R. Baeza-Yates and B. Ribeiro, Modern Information Retrieval, 2nd ed., Addison-Wesley, New York, 2011.
[2]
J. Barbay, F. Claude, T. Gagie, G. Navarro, and Y. Nekrich, Efficient fully-compressed sequence representations, Algorithmica, 69 (2014), pp. 232--268.
[3]
J. Barbay, F. Claude, and G. Navarro, Compact binary relation representations with rich functionality, Inform. and Comput., 232 (2013), pp. 19--37.
[4]
J. Barbay, A. Golynski, I. Munro, and S. S. Rao, Adaptive searching in succinctly encoded binary relations and tree-structured documents, Theoret. Comput. Sci., 387 (2007), pp. 284--297.
[5]
J. Barbay, M. He, I. Munro, and S. S. Rao, Succinct indexes for strings, binary relations and multi-labeled trees, ACM Trans. Algorithms, 7 (2011), 52.
[6]
J. Barbay and G. Navarro, Compressed representations of permutations, and applications, in Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS), IBFI, Schloss Dagstuhl, Germany, 2009, pp. 111--122.
[7]
D. Belazzougui and G. Navarro, New lower and upper bounds for representing sequences, in Proceedings of the 20th Annual European Symposium on Algorithms (ESA), Lecture Notes in Comput. Sci. 7501, Springer, Berlin, 2012, pp. 181--192.
[8]
D. Blandford and G. Blelloch, Compact representations of ordered sets, in Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2004, pp. 11--19.
[9]
G. Blelloch, Space-efficient dynamic orthogonal point location, segment intersection, and range reporting, in Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA), SIAM, Philadelphia, 2008, pp. 894--903.
[10]
P. Bose, M. He, A. Maheshwari, and P. Morin, Succinct orthogonal range search structures on a grid with applications to text indexing, in Proceedings of the 11th International Symposium on Algorithms and Data Structures (WADS), Springer, Berlin, 2009, pp. 98--109.
[11]
N. Brisaboa, A. Farin͂a, S. Ladra, and G. Navarro, Implicit indexing of natural language text by reorganizing bytecodes, Inform. Retrieval, 15 (2012), pp. 527--557.
[12]
M. Burrows and D. Wheeler, A block sorting lossless data compression algorithm, Technical report 124, Digital Equipment Corporation, Maynard, MA, 1994.
[13]
H. Chan, W.-K. Hon, T.-H. Lam, and K. Sadakane, Compressed indexes for dynamic text collections, ACM Trans. Algorithms, 3 (2007), 21.
[14]
H. Chan, W.-K. Hon, and T.-W. Lam, Compressed index for a dynamic collection of texts, in Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Comput. Sci. 3109, Springer, Berlin, 2004, pp. 445--456.
[15]
B. Chazelle and L. Guibas, Fractional cascading: I. A data structuring technique, Algorithmica J. IFAC, 1 (1986), pp. 133--162.
[16]
B. Chazelle and L. Guibas, Fractional cascading: II. Applications, Algorithmica J. IFAC, 1 (1986), pp. 163--191.
[17]
D. Clark, Compact Pat Trees, PhD thesis, University of Waterloo, Waterloo, Canada, 1996.
[18]
F. Claude and G. Navarro, Extended compact Web graph representations, in Algorithms and Applications (Ukkonen Festschrift), Lecture Notes in Comput. Sci. 6060, Springer, Berlin, 2010, pp. 77--91.
[19]
P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan, Compressing and indexing labeled trees, with applications, J. ACM, 57 (2009), 4.
[20]
P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro, Compressed representations of sequences and full-text indexes, ACM Trans. Algorithms, 3 (2007), 20.
[21]
G. Franceschini and R. Grossi, A general technique for managing strings in comparison-driven data structures, in Proceedings of the 31st International Colloquium on Automata, Languages and Programming (ICALP), Lecture Notes in Comput. Sci. 3142, Springer, Berlin, 2004, pp. 606--617.
[22]
M. Fredman and M. Saks, The cell probe complexity of dynamic data structures, in Proceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC), ACM, New York, 1989, pp. 345--354.
[23]
A. Golynski, I. Munro, and S. S. Rao, Rank/select operations on large alphabets: A tool for text indexing, in Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2006, pp. 368--373.
[24]
R. González and G. Navarro, Improved dynamic rank-select entropy-bound structures, in Proc. 8th Latin American Symposium on Theoretical Informatics (LATIN), Lecture Notes in Comput. Sci. 4957, Springer, Berlin, 2008, pp. 374--386.
[25]
R. González and G. Navarro, Rank/select on dynamic compressed sequences and applications, Theoret. Comput. Sci., 410 (2009), pp. 4414--4422.
[26]
R. Grossi, A. Gupta, and J. Vitter, High-order entropy-compressed text indexes, in Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2003, SIAM, Philadelphia, pp. 841--850.
[27]
R. Grossi and G. Ottaviano, The wavelet trie: maintaining an indexed sequence of strings in compressed space, in Proceedings of the 31st ACM Symposium on Principles of Database Systems (PODS), ACM, New York, 2012, pp. 203--214.
[28]
A. Gupta, W.-K. Hon, R. Shah, and J. Vitter, A framework for dynamizing succinct data structures, in Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP), Lecture Notes in Comput. Sci. 4596, Springer, Berlin, 2007, pp. 521--532.
[29]
M. He and I. Munro, Succinct representations of dynamic strings, in Proceedings of the 17th International Symposium on String Processing and Information Retrieval (SPIRE), Lecture Notes in Comput. Sci. 6393, Springer, Berlin, 2010, pp. 334--346.
[30]
W. -K. Hon, K. Sadakane, and W.-K. Sung, Succinct data structures for searchable partial sums, in Proceedings of 14th Annual International Symposium on Algorithms and Computation (ISAAC), Lecture Notes in Comput. Sci. 2906, Springer, Berlin, 2003, pp. 505--516.
[31]
W. -K. Hon, K. Sadakane, and W.-K. Sung, Breaking a time-and-space barrier in constructing full-text indices, SIAM Journal of Computing, 38 (2009), pp. 2162--2178.
[32]
W. -K. Hon, K. Sadakane, and W.-K. Sung, Succinct data structures for searchable partial sums with optimal worst-case performance, Theoret. Comput. Sci., 412 (2011), pp. 5176--5186.
[33]
H. Imai and T. Asano, Dynamic segment intersection search with applications, in Proceedings of the 25th Symposium on Foundations of Computer Science (FOCS), IEEE, Los Angeles, CA, 1984, pp. 393--402.
[34]
J. Kärkkäinen, Fast BWT in small space by blockwise suffix sorting, Theoret. Comput. Sci., 387 (2007), pp. 249--257.
[35]
J. Kärkkäinen and S. J. Puglisi, Fixed block compression boosting in FM-indexes, in Proceedings of the 18th International Symposium on String Processing and Information Retrieval (SPIRE), Lecture Notes in Comput. Sci. 7024, Springer, Berlin, 2011, pp. 174--184.
[36]
S. Lee and K. Park, Dynamic rank-select structures with applications to run-length encoded texts, in Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Comput. Sci. 4580, Springer, Berlin, 2007, pp. 95--106.
[37]
S. Lee and K. Park, Dynamic rank/select structures with applications to run-length encoded texts, Theoret. Comput. Sci., 410 (2009), pp. 4402--4413.
[38]
V. Mäkinen and G. Navarro, Dynamic entropy-compressed sequences and full-text indexes, in Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Comput. Sci. 4009, Springer, Berlin, 2006, pp. 307--318.
[39]
V. Mäkinen and G. Navarro, Dynamic entropy-compressed sequences and full-text indexes, ACM Trans. Algorithms, 4 (2008), 32.
[40]
C. Makris, Wavelet trees: A survey, Comput. Sci. Inform. Systems, 9 (2012), pp. 585--625.
[41]
G. Manzini, An analysis of the Burrows-Wheeler transform, J. ACM, 48 (2001), pp. 407--430.
[42]
K. Mehlhorn, Data Structures and Algorithms: Sorting and Searching, Monogr. EATCS Ser. 1, Theoret. Comput. Sci. Springer-Verlag, Berlin, 1984.
[43]
K. Mehlhorn and S. Näher, Dynamic fractional cascading, Algorithmica J. IFAC, 5 (1990), pp. 215--241.
[44]
C. Mortensen, Fully-dynamic two dimensional orthogonal range and line segment intersection reporting in logarithmic time, in Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2003, pp. 618--627.
[45]
I. Munro, An implicit data structure supporting insertion, deletion, and search in $O(\log n)$ time, J. Comput. System Sci., 33 (1986), pp. 66--74.
[46]
I. Munro, Tables, in Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), Lecture Notes in Comput. Sci. 1180, Springer, Berlin, 1996, pp. 37--42.
[47]
G. Navarro, Wavelet trees for all, J. Discrete Algorithms, 25 (2014), pp. 2--20.
[48]
G. Navarro and V. Mäkinen, Compressed full-text indexes, ACM Comput. Surveys, 39 (2007), 2.
[49]
G. Navarro and K. Sadakane, Fully-functional static and dynamic succinct trees, ACM Trans. Algorithms, 10 (2014), 16.
[50]
Y. Nekrich, A dynamic stabbing-max data structure with sub-logarithmic query time, in Proceedings of the 22nd International Symposium on Algorithms and Computation, (ISAAC), Lecture Notes in Comput. Sci. 7074, Springer, Berlin, 2011, pp. 170--179.
[51]
D. Okanohara and K. Sadakane, A linear-time Burrows-Wheeler transform using induced sorting, in Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE), Lecture Notes in Comput. Sci. 5721, Springer, Berlin, 2009, pp. 90--101.
[52]
M. Patrascu, Lower bounds for 2-dimensional range counting, in Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), ACM, New York, 2007, pp. 40--46.
[53]
R. Raman, V. Raman, and S. S. Rao, Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets, ACM Trans. Algorithms, 3 (2007), 8.
[54]
R. Raman and S. S. Rao, Succinct dynamic dictionaries and trees, in Proceedings of the 30th International Colloquium on Automata, Languages and Computation (ICALP), Lecture Notes in Comput. Sci. 2719, Springer, Berlin, 2003, pp. 357--368.
[55]
N. Välimäki and V. Mäkinen, Space-efficient algorithms for document retrieval, in Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching (CPM), Lecture Notes in Comput. Sci. 4580, Springer, Berlin, 2007, pp. 205--215.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Computing
SIAM Journal on Computing  Volume 43, Issue 5
2014
371 pages
ISSN:0097-5397
DOI:10.1137/smjcat.43.5
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2014

Author Tags

  1. succinct data structures
  2. strings
  3. $\mathsf{rank}$ and $\mathsf{select}$

Author Tags

  1. 68P05
  2. 68P10
  3. 68P20
  4. 68P30

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media