article

Compressed full-text indexes

Authors:

Gonzalo Navarro,

Veli MäkinenAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 39, Issue 1

Pages 2 - es

https://doi.org/10.1145/1216370.1216372

Published: 12 April 2007 Publication History

Abstract

Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, which radically changed the status of this area in less than 5 years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously.

In this article we present the main concepts underlying (compressed) self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems. Our aim is to give the background to understand and follow the developments in this area.

References

[1]

Abouelhoda, M., Kurtz, S., and Ohlebusch, E. 2004. Replacing suffix trees with enhanced suffix arrays. J. Discr. Alg. 2, 1, 53--86.]]

Digital Library

[2]

Alstrup, S., Brodal, G., and Rauhe, T. 2000. New data structures for orthogonal range searching. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (FOCS). 198--207.]]

Digital Library

[3]

Aluru, S. 2005. Handbook of Computational Molecular Biology. CRC Press, Boca Raton, FL.]]

Digital Library

[4]

Andersson, A. and Nilsson, S. 1995. Efficient implementation of suffix trees. Softw. Pract. and Exp. 25, 2, 129--141.]]

Digital Library

[5]

Apostolico, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words, NATO ISI Series. Springer-Verlag, Berlin, Germany, 85--96.]]

[6]

Arlazarov, V., Dinic, E., Konrod, M., and Faradzev, I. 1975. On economic construction of the transitive closure of a directed graph. Sov. Math. Dokl. 11, 1209--1210.]]

[7]

Arroyuelo, D. and Navarro, G. 2005. Space-efficient construction of LZ-index. In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 3827. Springer-Verlag, Berlin, Germany, 1143--1152.]]

Digital Library

[8]

Baeza-Yates, R. and Ribeiro, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA.]]

Digital Library

[9]

Bell, T., Cleary, J., and Witten, I. 1990. Text Compression. Prentice Hall, Englewood Cliffs, NJ.]]

Digital Library

[10]

Bentley, J., Sleator, D., Tarjan, R., and Wei, V. 1986. A locally adaptive compression scheme. Commun. ACM 29, 4, 320--330.]]

Digital Library

[11]

Blumer, A., Blumer, J., Haussler, D., McConnell, R., and Ehrenfeucht, A. 1987. Complete inverted files for efficient text retrieval and analysis. J. Assoc. Comp. Mach. 34, 3, 578--595.]]

Digital Library

[12]

Burrows, M. and Wheeler, D. 1994. A block sorting lossless data compression algorithm. Technical rep. 124. Digital Equipment Corporation (now part of Hewlett-Packard, Palo Alto, CA).]]

[13]

Chan, H.-L., Hon, W.-K., and Lam, T.-W. 2004. Compressed index for a dynamic collection of texts. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 445--456.]]

[14]

Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., and Wong, S.-S. 2006. A linear size index for approximate pattern matching. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 4009. Springer-Verlag, Berlin, Germany, 49--59.]]

Digital Library

[15]

Chazelle, B. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3, 427--462.]]

Digital Library

[16]

Clark, D. 1996. Compact pat trees. Ph.D. dissertation. University of Waterloo, Waterloo, Ont., Canada.]]

Digital Library

[17]

Clark, D. and Munro, I. 1996. Efficient suffix trees on secondary storage. In Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 383--391.]]

Digital Library

[18]

Clifford, R. 2005. Distributed suffix trees. J. Alg. 3, 2--4, 176--197.]]

[19]

Colussi, L. and de Col, A. 1996. A time and space efficient data structure for string searching on large texts. Inform. Process. Lett. 58, 5, 217--222.]]

[20]

Cover, T. and Thomas, J. 1991. Elements of Information Theory. Wiley, New York, NY.]]

Digital Library

[21]

Crauser, A. and Ferragina, P. 2002. A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32, 1, 1--35.]]

Digital Library

[22]

Crochemore, M. and Vérin, R. 1997. Direct construction of compact directed acyclic word graphs. In Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 1264. Springer-Verlag, Berlin, Germany, 116--129.]]

Digital Library

[23]

de Berg, M., van Kreveld, M., Overmars, M., and Schwarzkopf, O. 2000. Computational Geometry---Algorithms and Applications. Springer-Verlag, Berlin, Germany.]]

Digital Library

[24]

Elias, P. 1975. Universal codeword sets and representation of the integers. IEEE Trans. Inform. Theor. 21, 2, 194--203.]]

Digital Library

[25]

Farach, M. 1997. Optimal suffix tree construction with large alphabets. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science (FOCS). 137--143.]]

Digital Library

[26]

Farach, M., Ferragina, P., and Muthukrishnan, S. 2000. On the sorting complexity of suffix tree construction. J. Assoc. Comp. Mach. 47, 6, 987--1011.]]

Digital Library

[27]

Ferragina, P. 2007. String algorithms and data structures. In Algorithms for Massive Data Sets. Lecture Notes in Computer Science, Tutorial Book. Springer-Verlag, Berlin, Germany. To appear.]]

[28]

Ferragina, P., Giancarlo, R., Manzini, G., and Sciortino, M. 2005. Boosting textual compression in optimal linear time. J. Assoc. Comp. Mach. 52, 4, 688--713.]]

Digital Library

[29]

Ferragina, P. and Grossi, R. 1999. The string B-tree: A new data structure for string search in external memory and its applications. J. Assoc. Comput. Mach. 46, 2, 236--280.]]

Digital Library

[30]

Ferragina, P. and Manzini, G. 2000. Opportunistic data structures with applications. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (FOCS). 390--398.]]

Digital Library

[31]

Ferragina, P. and Manzini, G. 2001. An experimental study of an opportunistic index. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 269--278.]]

Digital Library

[32]

Ferragina, P. and Manzini, G. 2005. Indexing compressed texts. J. Assoc. Comput. Mach. 52, 4, 552--581.]]

Digital Library

[33]

Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2004. An alphabet-friendly FM-index. In Proceedings of the 11th International Symposium on String Processing and Information Retrieval (SPIRE). Lecture Notes in Computer Science, vol. 3246. Springer-Verlag, Berlin, Germany, 150--160.]]

[34]

Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2006. Compressed representation of sequences and full-text indexes. ACM Trans. Alg. To appear. Also published as Tech. rep. TR 2004-05, Technische Fakultät, Universität Bielefeld, Germany, Bielefeld, December 2004.]]

[35]

Fredkin, E. 1960. Trie memory. Commun. ACM 3, 490--500.]]

Digital Library

[36]

Gagie, T. 2006. Large alphabets and incompressibility. Inform. Process. Lett. 99, 6, 246--251.]]

Digital Library

[37]

Geary, R., Rahman, N., Raman, R., and Raman, V. 2004. A simple optimal representation for balanced parentheses. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 159--172.]]

[38]

Giegerich, R., Kurtz, S., and Stoye, J. 2003. Efficient implementation of lazy suffix trees. Softw. Pract. Exp. 33, 11, 1035--1049.]]

[39]

Golynski, A. 2006. Optimal lower bounds for rank and select indexes. In Proceedings of the 33th International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 4051. Springer-Verlag, Berlin, Germany, 370--381.]]

Digital Library

[40]

Golynski, A., Munro, I., and Rao, S. 2006. Rank/select operations on large alphabets: A tool for text indexing. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 368--373.]]

Digital Library

[41]

Gonnet, G., Baeza-Yates, R., and Snider, T. 1992. Information Retrieval: Data Structures and Algorithms, Chapter 3: New indices for text: Pat trees and Pat arrays. Prentice-Hall, Englewood Cliffs, NJ, 66--82.]]

Digital Library

[42]

González, R., Grabowski, S., Mäkinen, V., and Navarro, G. 2005. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA'05) (Greece, 2005). CTI Press and Ellinika Grammata, 27--38.]]

[43]

Grabowski, S., Mäkinen, V., and Navarro, G. 2004. First Huffman, then Burrows-Wheeler: An alphabet-independent FM-index. In Proceedings of the 11th International Symposium on String Processing and Information Retrieval (SPIRE). Lecture Notes in Computer Science, vol. 3246. Springer-Verlag, Berlin, Germany, 210--211.]]

[44]

Grabowski, S., Navarro, G., Przywarski, R., Salinger, A., and Mäkinen, V. 2006. A simple alphabet-independent FM-index. Int. J. Found. Comput. Sci. 17, 6, 1365--1384.]]

[45]

Grossi, R., Gupta, A., and Vitter, J. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 841--850.]]

Digital Library

[46]

Grossi, R., Gupta, A., and Vitter, J. 2004. When indexing equals compression: Experiments with compressing suffix arrays and applications. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 636--645.]]

Digital Library

[47]

Grossi, R. and Vitter, J. 2000. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC). 397--406.]]

Digital Library

[48]

Grossi, R. and Vitter, J. 2006. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35, 2, 378--407.]]

Digital Library

[49]

Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, U.K.]]

Digital Library

[50]

He, M., Munro, I., and Rao, S. 2005. A categorization theorem on suffix arrays with applications to space efficient text indexes. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 23--32.]]

Digital Library

[51]

Healy, J., Thomas, E. E., Schwartz, J. T., and Wigler, M. 2003. Annotating large genomes with exact word matches. Genome Res. 13, 2306--2315.]]

[52]

Hon, W.-K., Lam, T.-W., Sadakane, K., and Sung, W.-K. 2003a. Constructing compressed suffix arrays with large alphabets. In Proceedings of the 14th Annual International Symposium on Algorithms and Computation (ISAAC). 240--249.]]

[53]

Hon, W.-K., Lam, T.-W., Sadakane, K., Sung, W.-K., and Yu, S.-M. 2004. Compressed index for dynamic text. In Proceedings of the 14th IEEE Data Compression Conference (DCC). 102--111.]]

Digital Library

[54]

Hon, W.-K., Sadakane, K., and Sung, W.-K. 2003b. Breaking a time-and-space barrier in constructing full-text indices. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS). 251--260.]]

Digital Library

[55]

Huynh, T., Hon, W.-K., Lam, T.-W., and Sung, W.-K. 2006. Approximate string matching using compressed suffix arrays. Theoret. Comput. Sci. 352, 1--3, 240--249.]]

Digital Library

[56]

Irving, R. 1995. Suffix binary search trees. Technical rep. TR-1995-7 (April). Computer Science Department, University of Glasgow, Glasgow, U.K.]]

[57]

Itoh, H. and Tanaka, H. 1999. An efficient method for in-memory construction of suffix arrays. In Proceedings of the 6th International Symposium on String Processing and Information Retrieval (SPIRE). IEEE Computer Society Press, Los Alamitos, CA, 81--88.]]

Digital Library

[58]

Jacobson, G. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (FOCS). 549--554.]]

Digital Library

[59]

Kärkkäinen, J. 1995. Suffix cactus: A cross between suffix tree and suffix array. In Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 937. Springer-Verlag, Berlin, Germany, 191--204.]]

[60]

Kärkkäinen, J. 1999. Repetition-based text indexing. Ph.D. dissertation. Department of Computer Science, University of Helsinki, Helsinki, Finland.]]

[61]

Kärkkäinen, J. and Rao, S. 2003. Algorithms for Memory Hierarchies, Chapter 7: Full-text indexes in external memory. Lecture Notes in Computer Science, vol. 2625. Springer-Verlag, Berlin, Germany, 149--170.]]

[62]

Kärkkäinen, J. and Sanders, P. 2003. Simple linear work suffix array construction. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 2719. Springer-Verlag, Berlin, Germany, 943--955.]]

Digital Library

[63]

Kärkkäinen, J. and Sutinen, E. 1998. Lempel-Ziv index for q-grams. Algorithmica 21, 1, 137--154.]]

[64]

Kärkkäinen, J. and Ukkonen, E. 1996a. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proceedings of the 3rd South American Workshop on String Processing (WSP). Carleton University Press, Ottawa, Ont., Canada, 141--155.]]

[65]

Kärkkäinen, J. and Ukkonen, E. 1996b. Sparse suffix trees. In Proceedings of the 2nd Annual International Conference on Computing and Combinatorics (COCOON). Lecture Notes in Computer Science, vol. 1090. Springer-Verlag, Berlin, Germany, 219--230.]]

Digital Library

[66]

Kim, D. and Park, H. 2005. A new compressed suffix tree supporting fast search and its construction algorithm using optimal working space. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 33--44.]]

Digital Library

[67]

Kim, D., Sim, J., Park, H., and Park, K. 2005a. Constructing suffix arrays in linear time. J. Discr. Alg. 3, 2--4, 126--142.]]

[68]

Kim, D.-K., Na, J.-C., Kim, J.-E., and Park, K. 2005b. Efficient implementation of rank and select functions for succinct representation. In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05). Lecture Notes in Computer Science, vol. 3503. Springer-Verlag, Berlin, Germany, 315--327.]]

Digital Library

[69]

Knuth, D. 1973. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, Reading, MA.]]

Digital Library

[70]

Ko, P. and Aluru, S. 2005. Space efficient linear time construction of suffix arrays. J. Discr. Alg. 3, 2--4, 143--156.]]

[71]

Ko, P. and Aluru, S. 2006. Obtaining provably good performance from suffix trees in secondary storage. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 4009. Springer-Verlag, Berlin, Germany, 72--83.]]

Digital Library

[72]

Kosaraju, R. and Manzini, G. 1999. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29, 3, 893--911.]]

Digital Library

[73]

Kurtz, S. 1998. Reducing the space requirements of suffix trees. Report 98-03. Technische Kakultät, Universität Bielefeld, Bielefeld, Germany.]]

[74]

Lam, T.-W., Sadakane, K., Sung, W.-K., and Yiu, S.-M. 2002. A space and time efficient algorithm for constructing compressed suffix arrays. In Proceedings of the 8th Annual International Conference on Computing and Combinatorics (COCOON). 401--410.]]

Digital Library

[75]

Lam, T.-W., Sung, W.-K., and Wong, S.-S. 2005. Improved approximate string matching using compressed suffix data structures. In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 3827. Springer-Verlag, Berlin, Germany, 339--348.]]

Digital Library

[76]

Larsson, N. and Sadakane, K. 1999. Faster suffix sorting. Technical rep. LU-CS-TR:99-214. Department of Computer Science, Lund University, Lund, Sweden.]]

[77]

Lempel, A. and Ziv, J. 1976. On the complexity of finite sequences. IEEE Trans. Inform. Theor. 22, 1, 75--81.]]

Digital Library

[78]

Mäkinen, V. 2000. Compact suffix array. In Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 1848. Springer-Verlag, Berlin, Germany, 305--319.]]

Digital Library

[79]

Mäkinen, V. 2003. Compact suffix array---a space-efficient full-text index. Fund. Inform. 56, 1--2, 191--210.]]

Digital Library

[80]

Mäkinen, V. and Navarro, G. 2004a. Compressed compact suffix arrays. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 420--433.]]

[81]

Mäkinen, V. and Navarro, G. 2004b. New search algorithms and time/space tradeoffs for succinct suffix arrays. Technical rep. C-2004-20 (April). University of Helsinki, Helsinki, Finland.]]

[82]

Mäkinen, V. and Navarro, G. 2004c. Run-length FM-index. In Proceedings of the DIMACS Workshop: “The Burrows-Wheeler Transform: Ten Years Later.” 17--19.]]

[83]

Mäkinen, V. and Navarro, G. 2005a. Succinct suffix arrays based on run-length encoding. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 45--56.]]

Digital Library

[84]

Mäkinen, V. and Navarro, G. 2005b. Succinct suffix arrays based on run-length encoding. Technical rep. TR/DCC-2005-4 (Mar.). Department of Computer Science, University of Chile, Santiago, Chile.]]

[85]

Mäkinen, V. and Navarro, G. 2005c. Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12, 1, 40--66.]]

Digital Library

[86]

Mäkinen, V. and Navarro, G. 2006. Dynamic entropy-compressed sequences and full-text indexes. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 4009. Springer-Verlag, Berlin, Germany, 307--318.]]

Digital Library

[87]

Mäkinen, V., Navarro, G., and Sadakane, K. 2004. Advantages of backward searching---efficient secondary memory and distributed implementation of compressed suffix arrays. In Proceedings of the 15th Annual International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 3341. Springer-Verlag, Berlin, Germany, 681--692.]]

[88]

Manber, U. and Myers, G. 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948.]]

Digital Library

[89]

Manzini, G. 2001. An analysis of the Burrows-Wheeler transform. J. Assoc. Comput. Mach. 48, 3, 407--430.]]

Digital Library

[90]

Manzini, G. and Ferragina, P. 2004. Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 1, 33--50.]]

Digital Library

[91]

McCreight, E. 1976. A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 2, 262--272.]]

Digital Library

[92]

Miltersen, P. 2005. Lower bounds on the size of selection and rank indexes. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 11--12.]]

Digital Library

[93]

Morrison, D. 1968. PATRICIA--practical algorithm to retrieve information coded in alphanumeric. J. Assoc. Comput. Mach. 15, 4, 514--534.]]

Digital Library

[94]

Munro, I. 1996. Tables. In Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Lecture Notes in Computer Science, vol. 1180. Springer-Verlag, Berlin, Germany, 37--42.]]

Digital Library

[95]

Munro, I. and Raman, V. 1997. Succinct representation of balanced parentheses, static trees and planar graphs. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science (FOCS). 118--126.]]

Digital Library

[96]

Munro, I., Raman, V., and Rao, S. 2001. Space efficient suffix trees. J. Alg. 39, 2, 205--222.]]

Digital Library

[97]

Na, J.-C. 2005. Linear-time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 57--67.]]

Digital Library

[98]

Navarro, G. 2002. Indexing text using the ziv-lempel trie. In Proceedings of the 9th International Symposium on String Processing and Information Retrieval (SPIRE). Lecture Notes in Computer Science, vol. 2476. Springer-Verlag, Berlin, Germany, 325--336.]]

Digital Library

[99]

Navarro, G. 2004. Indexing text using the Ziv-Lempel trie. J. Discr. Alg. 2, 1, 87--114.]]

Digital Library

[100]

Navarro, G., Moura, E., Neubert, M., Ziviani, N., and Baeza-Yates, R. 2000. Adding compression to block addressing inverted indexes. Inform. Retriev. 3, 1, 49--77.]]

Digital Library

[101]

Pagh, R. 1999. Low redundancy in dictionaries with O(1) worst case lookup time. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP). 595--604.]]

Digital Library

[102]

Puglisi, S., Smyth, W., and Turpin, A. 2007. A taxonomy of suffix array construction algorithms. ACM Comput. Surv. To appear.]]

Digital Library

[103]

Raman, R. 1996. Priority queues: small, monotone and trans-dichotomous. In Proceedings of the 4th European Symposium on Algorithms (ESA). Lecture Notes in Computer Science, vol. 1136. Springer-Verlag, Berlin, Germany, 121--137.]]

Digital Library

[104]

Raman, R., Raman, V., and Rao, S. 2002. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 233--242.]]

Digital Library

[105]

Rao, S. 2002. Time-space trade-offs for compressed suffix arrays. Inform. Process. Lett. 82, 6, 307--311.]]

Digital Library

[106]

Sadakane, K. 2000. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proceedings of the 11th International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 1969. Springer-Verlag, Berlin, Germany, 410--421.]]

Digital Library

[107]

Sadakane, K. 2002. Succinct representations of lcp information and improvements in the compressed suffix arrays. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 225--232.]]

Digital Library

[108]

Sadakane, K. 2003. New text indexing functionalities of the compressed suffix arrays. J. Alg. 48, 2, 294--313.]]

Digital Library

[109]

Sadakane, K. and Okanohara, D. 2006. Practical entropy-compressed rank/select dictionary. Available online at http://arxiv.org/abs/cs.DS/0610001. To appear in Proceedings of ALENEX'07.]]

[110]

Schürmann, K. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX/ANALCO). SIAM Press, Philadelphia, PA. 77--85.]]

[111]

Sedgewick, R. and Flajolet, P. 1996. An Introduction to the Analysis of Algorithms. Addison-Wesley, Reading, MA.]]

Digital Library

[112]

Sim, J., Kim, D., Park, H., and Park, K. 2003. Linear-time search in suffix arrays. In Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms (AWOCA). 139--146.]]

[113]

Ukkonen, E. 1995. On-line construction of suffix trees. Algorithmica 14, 3, 249--260.]]

Digital Library

[114]

Weiner, P. 1973. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory. 1--11.]]

Digital Library

[115]

Witten, I., Moffat, A., and Bell, T. 1999. Managing Gigabytes, 2nd ed. Morgan Kaufmann, San Francisco, CA.]]

Digital Library

[116]

Ziv, J. and Lempel, A. 1978. Compression of individual sequences via variable length coding. IEEE Trans. Inform. Theor. 24, 5, 530--536.]]

Digital Library

[117]

Ziviani, N., Moura, E., Navarro, G., and Baeza-Yates, R. 2000. Compression: A key for next-generation text retrieval systems. IEEE Comput. 33, 11, 37--44.]]

Digital Library

Cited By

Ramos LLouza FTelles G(2025)Comparative genomics with succinct colored de Bruijn graphsActa Informatica10.1007/s00236-024-00467-762:1Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s00236-024-00467-7
Saada BZhang TSiga EZhang JMagalhães Muniz M(2024)Whole-Genome Alignment: Methods, Challenges, and Future DirectionsApplied Sciences10.3390/app1411483714:11(4837)Online publication date: 3-Jun-2024
https://doi.org/10.3390/app14114837
Iakovlev ZChulkov AGolikov NLukianov VZinoviev NIvanov DAksenov VDig DBryksin TGolubev YBezzubov A(2024)Trigram-Based Persistent IDE Indices with Quick StartupProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648460(81-90)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648460
Show More Cited By

Index Terms

Recommendations

Compressed representations of sequences and full-text indexes

Given a sequence S = s₁s₂…s_n of integers smaller than r = O(polylog(n)), we show how S can be represented using nH₀(S) + o(n) bits, so that we can know any s_q, as well as answer rank and select queries on S, in constant time. H₀(S) is the zero-order ...
Indexing compressed text

We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form.Our first compressed data structure retrieves the occ ...
Compressed text indexes: From theory to practice

A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This represents a significant advancement over the (full-)text indexing techniques of the previous decade, whose indexes required several ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 39, Issue 1

2007

148 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/1216370

Issue’s Table of Contents

Copyright © 2007 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2007

Published in CSUR Volume 39, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

602
Total Citations
View Citations
5,670
Total Downloads

Downloads (Last 12 months)109
Downloads (Last 6 weeks)15

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ramos LLouza FTelles G(2025)Comparative genomics with succinct colored de Bruijn graphsActa Informatica10.1007/s00236-024-00467-762:1Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s00236-024-00467-7
Saada BZhang TSiga EZhang JMagalhães Muniz M(2024)Whole-Genome Alignment: Methods, Challenges, and Future DirectionsApplied Sciences10.3390/app1411483714:11(4837)Online publication date: 3-Jun-2024
https://doi.org/10.3390/app14114837
Iakovlev ZChulkov AGolikov NLukianov VZinoviev NIvanov DAksenov VDig DBryksin TGolubev YBezzubov A(2024)Trigram-Based Persistent IDE Indices with Quick StartupProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648460(81-90)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643796.3648460
Boffa AFerragina PTosoni FVinciguerra G(2024)CoCo-trie: Data-aware compression and indexing of stringsInformation Systems10.1016/j.is.2023.102316120(102316)Online publication date: Feb-2024
https://doi.org/10.1016/j.is.2023.102316
Boucher CCenzato DLipták ZRossi MSciortino M(2024) r-indexing the eBWTInformation and Computation10.1016/j.ic.2024.105155298:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.ic.2024.105155
Bannai HKärkkäinen JKöppl DPia̧tkowski M(2024)Constructing and indexing the bijective and extended Burrows–Wheeler transformInformation and Computation10.1016/j.ic.2024.105153297:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ic.2024.105153
Meneses ENavarro CFerrada HQuezada F(2024)Accelerating range minimum queries with ray tracing coresFuture Generation Computer Systems10.1016/j.future.2024.03.040157:C(98-111)Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.future.2024.03.040
Baláž AGagie TGoga AHeumos SNavarro GPetescia ASirén J(2024)Wheeler MapsLATIN 2024: Theoretical Informatics10.1007/978-3-031-55598-5_12(178-192)Online publication date: 6-Mar-2024
https://doi.org/10.1007/978-3-031-55598-5_12
Duc Luong DQuang Phuong VTung H(2023)AN IMPROVED INDEXING METHOD FOR QUERYING BIG XML FILESJournal of Computer Science and Cybernetics10.15625/1813-9663/19018(323-342)Online publication date: 25-Dec-2023
https://doi.org/10.15625/1813-9663/19018
Bille PGørtz ISteiner T(2023)String Indexing with Compressed PatternsACM Transactions on Algorithms10.1145/360714119:4(1-19)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3607141
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents