skip to main content
article

Compressed full-text indexes

Published: 12 April 2007 Publication History

Abstract

Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, which radically changed the status of this area in less than 5 years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously.
In this article we present the main concepts underlying (compressed) self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems. Our aim is to give the background to understand and follow the developments in this area.

References

[1]
Abouelhoda, M., Kurtz, S., and Ohlebusch, E. 2004. Replacing suffix trees with enhanced suffix arrays. J. Discr. Alg. 2, 1, 53--86.]]
[2]
Alstrup, S., Brodal, G., and Rauhe, T. 2000. New data structures for orthogonal range searching. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (FOCS). 198--207.]]
[3]
Aluru, S. 2005. Handbook of Computational Molecular Biology. CRC Press, Boca Raton, FL.]]
[4]
Andersson, A. and Nilsson, S. 1995. Efficient implementation of suffix trees. Softw. Pract. and Exp. 25, 2, 129--141.]]
[5]
Apostolico, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words, NATO ISI Series. Springer-Verlag, Berlin, Germany, 85--96.]]
[6]
Arlazarov, V., Dinic, E., Konrod, M., and Faradzev, I. 1975. On economic construction of the transitive closure of a directed graph. Sov. Math. Dokl. 11, 1209--1210.]]
[7]
Arroyuelo, D. and Navarro, G. 2005. Space-efficient construction of LZ-index. In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 3827. Springer-Verlag, Berlin, Germany, 1143--1152.]]
[8]
Baeza-Yates, R. and Ribeiro, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA.]]
[9]
Bell, T., Cleary, J., and Witten, I. 1990. Text Compression. Prentice Hall, Englewood Cliffs, NJ.]]
[10]
Bentley, J., Sleator, D., Tarjan, R., and Wei, V. 1986. A locally adaptive compression scheme. Commun. ACM 29, 4, 320--330.]]
[11]
Blumer, A., Blumer, J., Haussler, D., McConnell, R., and Ehrenfeucht, A. 1987. Complete inverted files for efficient text retrieval and analysis. J. Assoc. Comp. Mach. 34, 3, 578--595.]]
[12]
Burrows, M. and Wheeler, D. 1994. A block sorting lossless data compression algorithm. Technical rep. 124. Digital Equipment Corporation (now part of Hewlett-Packard, Palo Alto, CA).]]
[13]
Chan, H.-L., Hon, W.-K., and Lam, T.-W. 2004. Compressed index for a dynamic collection of texts. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 445--456.]]
[14]
Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., and Wong, S.-S. 2006. A linear size index for approximate pattern matching. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 4009. Springer-Verlag, Berlin, Germany, 49--59.]]
[15]
Chazelle, B. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3, 427--462.]]
[16]
Clark, D. 1996. Compact pat trees. Ph.D. dissertation. University of Waterloo, Waterloo, Ont., Canada.]]
[17]
Clark, D. and Munro, I. 1996. Efficient suffix trees on secondary storage. In Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 383--391.]]
[18]
Clifford, R. 2005. Distributed suffix trees. J. Alg. 3, 2--4, 176--197.]]
[19]
Colussi, L. and de Col, A. 1996. A time and space efficient data structure for string searching on large texts. Inform. Process. Lett. 58, 5, 217--222.]]
[20]
Cover, T. and Thomas, J. 1991. Elements of Information Theory. Wiley, New York, NY.]]
[21]
Crauser, A. and Ferragina, P. 2002. A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32, 1, 1--35.]]
[22]
Crochemore, M. and Vérin, R. 1997. Direct construction of compact directed acyclic word graphs. In Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 1264. Springer-Verlag, Berlin, Germany, 116--129.]]
[23]
de Berg, M., van Kreveld, M., Overmars, M., and Schwarzkopf, O. 2000. Computational Geometry---Algorithms and Applications. Springer-Verlag, Berlin, Germany.]]
[24]
Elias, P. 1975. Universal codeword sets and representation of the integers. IEEE Trans. Inform. Theor. 21, 2, 194--203.]]
[25]
Farach, M. 1997. Optimal suffix tree construction with large alphabets. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science (FOCS). 137--143.]]
[26]
Farach, M., Ferragina, P., and Muthukrishnan, S. 2000. On the sorting complexity of suffix tree construction. J. Assoc. Comp. Mach. 47, 6, 987--1011.]]
[27]
Ferragina, P. 2007. String algorithms and data structures. In Algorithms for Massive Data Sets. Lecture Notes in Computer Science, Tutorial Book. Springer-Verlag, Berlin, Germany. To appear.]]
[28]
Ferragina, P., Giancarlo, R., Manzini, G., and Sciortino, M. 2005. Boosting textual compression in optimal linear time. J. Assoc. Comp. Mach. 52, 4, 688--713.]]
[29]
Ferragina, P. and Grossi, R. 1999. The string B-tree: A new data structure for string search in external memory and its applications. J. Assoc. Comput. Mach. 46, 2, 236--280.]]
[30]
Ferragina, P. and Manzini, G. 2000. Opportunistic data structures with applications. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (FOCS). 390--398.]]
[31]
Ferragina, P. and Manzini, G. 2001. An experimental study of an opportunistic index. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 269--278.]]
[32]
Ferragina, P. and Manzini, G. 2005. Indexing compressed texts. J. Assoc. Comput. Mach. 52, 4, 552--581.]]
[33]
Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2004. An alphabet-friendly FM-index. In Proceedings of the 11th International Symposium on String Processing and Information Retrieval (SPIRE). Lecture Notes in Computer Science, vol. 3246. Springer-Verlag, Berlin, Germany, 150--160.]]
[34]
Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2006. Compressed representation of sequences and full-text indexes. ACM Trans. Alg. To appear. Also published as Tech. rep. TR 2004-05, Technische Fakultät, Universität Bielefeld, Germany, Bielefeld, December 2004.]]
[35]
Fredkin, E. 1960. Trie memory. Commun. ACM 3, 490--500.]]
[36]
Gagie, T. 2006. Large alphabets and incompressibility. Inform. Process. Lett. 99, 6, 246--251.]]
[37]
Geary, R., Rahman, N., Raman, R., and Raman, V. 2004. A simple optimal representation for balanced parentheses. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 159--172.]]
[38]
Giegerich, R., Kurtz, S., and Stoye, J. 2003. Efficient implementation of lazy suffix trees. Softw. Pract. Exp. 33, 11, 1035--1049.]]
[39]
Golynski, A. 2006. Optimal lower bounds for rank and select indexes. In Proceedings of the 33th International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 4051. Springer-Verlag, Berlin, Germany, 370--381.]]
[40]
Golynski, A., Munro, I., and Rao, S. 2006. Rank/select operations on large alphabets: A tool for text indexing. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 368--373.]]
[41]
Gonnet, G., Baeza-Yates, R., and Snider, T. 1992. Information Retrieval: Data Structures and Algorithms, Chapter 3: New indices for text: Pat trees and Pat arrays. Prentice-Hall, Englewood Cliffs, NJ, 66--82.]]
[42]
González, R., Grabowski, S., Mäkinen, V., and Navarro, G. 2005. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA'05) (Greece, 2005). CTI Press and Ellinika Grammata, 27--38.]]
[43]
Grabowski, S., Mäkinen, V., and Navarro, G. 2004. First Huffman, then Burrows-Wheeler: An alphabet-independent FM-index. In Proceedings of the 11th International Symposium on String Processing and Information Retrieval (SPIRE). Lecture Notes in Computer Science, vol. 3246. Springer-Verlag, Berlin, Germany, 210--211.]]
[44]
Grabowski, S., Navarro, G., Przywarski, R., Salinger, A., and Mäkinen, V. 2006. A simple alphabet-independent FM-index. Int. J. Found. Comput. Sci. 17, 6, 1365--1384.]]
[45]
Grossi, R., Gupta, A., and Vitter, J. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 841--850.]]
[46]
Grossi, R., Gupta, A., and Vitter, J. 2004. When indexing equals compression: Experiments with compressing suffix arrays and applications. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 636--645.]]
[47]
Grossi, R. and Vitter, J. 2000. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC). 397--406.]]
[48]
Grossi, R. and Vitter, J. 2006. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35, 2, 378--407.]]
[49]
Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, U.K.]]
[50]
He, M., Munro, I., and Rao, S. 2005. A categorization theorem on suffix arrays with applications to space efficient text indexes. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 23--32.]]
[51]
Healy, J., Thomas, E. E., Schwartz, J. T., and Wigler, M. 2003. Annotating large genomes with exact word matches. Genome Res. 13, 2306--2315.]]
[52]
Hon, W.-K., Lam, T.-W., Sadakane, K., and Sung, W.-K. 2003a. Constructing compressed suffix arrays with large alphabets. In Proceedings of the 14th Annual International Symposium on Algorithms and Computation (ISAAC). 240--249.]]
[53]
Hon, W.-K., Lam, T.-W., Sadakane, K., Sung, W.-K., and Yu, S.-M. 2004. Compressed index for dynamic text. In Proceedings of the 14th IEEE Data Compression Conference (DCC). 102--111.]]
[54]
Hon, W.-K., Sadakane, K., and Sung, W.-K. 2003b. Breaking a time-and-space barrier in constructing full-text indices. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS). 251--260.]]
[55]
Huynh, T., Hon, W.-K., Lam, T.-W., and Sung, W.-K. 2006. Approximate string matching using compressed suffix arrays. Theoret. Comput. Sci. 352, 1--3, 240--249.]]
[56]
Irving, R. 1995. Suffix binary search trees. Technical rep. TR-1995-7 (April). Computer Science Department, University of Glasgow, Glasgow, U.K.]]
[57]
Itoh, H. and Tanaka, H. 1999. An efficient method for in-memory construction of suffix arrays. In Proceedings of the 6th International Symposium on String Processing and Information Retrieval (SPIRE). IEEE Computer Society Press, Los Alamitos, CA, 81--88.]]
[58]
Jacobson, G. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (FOCS). 549--554.]]
[59]
Kärkkäinen, J. 1995. Suffix cactus: A cross between suffix tree and suffix array. In Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 937. Springer-Verlag, Berlin, Germany, 191--204.]]
[60]
Kärkkäinen, J. 1999. Repetition-based text indexing. Ph.D. dissertation. Department of Computer Science, University of Helsinki, Helsinki, Finland.]]
[61]
Kärkkäinen, J. and Rao, S. 2003. Algorithms for Memory Hierarchies, Chapter 7: Full-text indexes in external memory. Lecture Notes in Computer Science, vol. 2625. Springer-Verlag, Berlin, Germany, 149--170.]]
[62]
Kärkkäinen, J. and Sanders, P. 2003. Simple linear work suffix array construction. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP). Lecture Notes in Computer Science, vol. 2719. Springer-Verlag, Berlin, Germany, 943--955.]]
[63]
Kärkkäinen, J. and Sutinen, E. 1998. Lempel-Ziv index for q-grams. Algorithmica 21, 1, 137--154.]]
[64]
Kärkkäinen, J. and Ukkonen, E. 1996a. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proceedings of the 3rd South American Workshop on String Processing (WSP). Carleton University Press, Ottawa, Ont., Canada, 141--155.]]
[65]
Kärkkäinen, J. and Ukkonen, E. 1996b. Sparse suffix trees. In Proceedings of the 2nd Annual International Conference on Computing and Combinatorics (COCOON). Lecture Notes in Computer Science, vol. 1090. Springer-Verlag, Berlin, Germany, 219--230.]]
[66]
Kim, D. and Park, H. 2005. A new compressed suffix tree supporting fast search and its construction algorithm using optimal working space. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 33--44.]]
[67]
Kim, D., Sim, J., Park, H., and Park, K. 2005a. Constructing suffix arrays in linear time. J. Discr. Alg. 3, 2--4, 126--142.]]
[68]
Kim, D.-K., Na, J.-C., Kim, J.-E., and Park, K. 2005b. Efficient implementation of rank and select functions for succinct representation. In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05). Lecture Notes in Computer Science, vol. 3503. Springer-Verlag, Berlin, Germany, 315--327.]]
[69]
Knuth, D. 1973. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, Reading, MA.]]
[70]
Ko, P. and Aluru, S. 2005. Space efficient linear time construction of suffix arrays. J. Discr. Alg. 3, 2--4, 143--156.]]
[71]
Ko, P. and Aluru, S. 2006. Obtaining provably good performance from suffix trees in secondary storage. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 4009. Springer-Verlag, Berlin, Germany, 72--83.]]
[72]
Kosaraju, R. and Manzini, G. 1999. Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29, 3, 893--911.]]
[73]
Kurtz, S. 1998. Reducing the space requirements of suffix trees. Report 98-03. Technische Kakultät, Universität Bielefeld, Bielefeld, Germany.]]
[74]
Lam, T.-W., Sadakane, K., Sung, W.-K., and Yiu, S.-M. 2002. A space and time efficient algorithm for constructing compressed suffix arrays. In Proceedings of the 8th Annual International Conference on Computing and Combinatorics (COCOON). 401--410.]]
[75]
Lam, T.-W., Sung, W.-K., and Wong, S.-S. 2005. Improved approximate string matching using compressed suffix data structures. In Proceedings of the 16th Annual International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 3827. Springer-Verlag, Berlin, Germany, 339--348.]]
[76]
Larsson, N. and Sadakane, K. 1999. Faster suffix sorting. Technical rep. LU-CS-TR:99-214. Department of Computer Science, Lund University, Lund, Sweden.]]
[77]
Lempel, A. and Ziv, J. 1976. On the complexity of finite sequences. IEEE Trans. Inform. Theor. 22, 1, 75--81.]]
[78]
Mäkinen, V. 2000. Compact suffix array. In Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 1848. Springer-Verlag, Berlin, Germany, 305--319.]]
[79]
Mäkinen, V. 2003. Compact suffix array---a space-efficient full-text index. Fund. Inform. 56, 1--2, 191--210.]]
[80]
Mäkinen, V. and Navarro, G. 2004a. Compressed compact suffix arrays. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3109. Springer-Verlag, Berlin, Germany, 420--433.]]
[81]
Mäkinen, V. and Navarro, G. 2004b. New search algorithms and time/space tradeoffs for succinct suffix arrays. Technical rep. C-2004-20 (April). University of Helsinki, Helsinki, Finland.]]
[82]
Mäkinen, V. and Navarro, G. 2004c. Run-length FM-index. In Proceedings of the DIMACS Workshop: “The Burrows-Wheeler Transform: Ten Years Later.” 17--19.]]
[83]
Mäkinen, V. and Navarro, G. 2005a. Succinct suffix arrays based on run-length encoding. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 45--56.]]
[84]
Mäkinen, V. and Navarro, G. 2005b. Succinct suffix arrays based on run-length encoding. Technical rep. TR/DCC-2005-4 (Mar.). Department of Computer Science, University of Chile, Santiago, Chile.]]
[85]
Mäkinen, V. and Navarro, G. 2005c. Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12, 1, 40--66.]]
[86]
Mäkinen, V. and Navarro, G. 2006. Dynamic entropy-compressed sequences and full-text indexes. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 4009. Springer-Verlag, Berlin, Germany, 307--318.]]
[87]
Mäkinen, V., Navarro, G., and Sadakane, K. 2004. Advantages of backward searching---efficient secondary memory and distributed implementation of compressed suffix arrays. In Proceedings of the 15th Annual International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 3341. Springer-Verlag, Berlin, Germany, 681--692.]]
[88]
Manber, U. and Myers, G. 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948.]]
[89]
Manzini, G. 2001. An analysis of the Burrows-Wheeler transform. J. Assoc. Comput. Mach. 48, 3, 407--430.]]
[90]
Manzini, G. and Ferragina, P. 2004. Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 1, 33--50.]]
[91]
McCreight, E. 1976. A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 2, 262--272.]]
[92]
Miltersen, P. 2005. Lower bounds on the size of selection and rank indexes. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 11--12.]]
[93]
Morrison, D. 1968. PATRICIA--practical algorithm to retrieve information coded in alphanumeric. J. Assoc. Comput. Mach. 15, 4, 514--534.]]
[94]
Munro, I. 1996. Tables. In Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Lecture Notes in Computer Science, vol. 1180. Springer-Verlag, Berlin, Germany, 37--42.]]
[95]
Munro, I. and Raman, V. 1997. Succinct representation of balanced parentheses, static trees and planar graphs. In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science (FOCS). 118--126.]]
[96]
Munro, I., Raman, V., and Rao, S. 2001. Space efficient suffix trees. J. Alg. 39, 2, 205--222.]]
[97]
Na, J.-C. 2005. Linear-time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets. In Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM). Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 57--67.]]
[98]
Navarro, G. 2002. Indexing text using the ziv-lempel trie. In Proceedings of the 9th International Symposium on String Processing and Information Retrieval (SPIRE). Lecture Notes in Computer Science, vol. 2476. Springer-Verlag, Berlin, Germany, 325--336.]]
[99]
Navarro, G. 2004. Indexing text using the Ziv-Lempel trie. J. Discr. Alg. 2, 1, 87--114.]]
[100]
Navarro, G., Moura, E., Neubert, M., Ziviani, N., and Baeza-Yates, R. 2000. Adding compression to block addressing inverted indexes. Inform. Retriev. 3, 1, 49--77.]]
[101]
Pagh, R. 1999. Low redundancy in dictionaries with O(1) worst case lookup time. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP). 595--604.]]
[102]
Puglisi, S., Smyth, W., and Turpin, A. 2007. A taxonomy of suffix array construction algorithms. ACM Comput. Surv. To appear.]]
[103]
Raman, R. 1996. Priority queues: small, monotone and trans-dichotomous. In Proceedings of the 4th European Symposium on Algorithms (ESA). Lecture Notes in Computer Science, vol. 1136. Springer-Verlag, Berlin, Germany, 121--137.]]
[104]
Raman, R., Raman, V., and Rao, S. 2002. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 233--242.]]
[105]
Rao, S. 2002. Time-space trade-offs for compressed suffix arrays. Inform. Process. Lett. 82, 6, 307--311.]]
[106]
Sadakane, K. 2000. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proceedings of the 11th International Symposium on Algorithms and Computation (ISAAC). Lecture Notes in Computer Science, vol. 1969. Springer-Verlag, Berlin, Germany, 410--421.]]
[107]
Sadakane, K. 2002. Succinct representations of lcp information and improvements in the compressed suffix arrays. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 225--232.]]
[108]
Sadakane, K. 2003. New text indexing functionalities of the compressed suffix arrays. J. Alg. 48, 2, 294--313.]]
[109]
Sadakane, K. and Okanohara, D. 2006. Practical entropy-compressed rank/select dictionary. Available online at http://arxiv.org/abs/cs.DS/0610001. To appear in Proceedings of ALENEX'07.]]
[110]
Schürmann, K. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments and 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX/ANALCO). SIAM Press, Philadelphia, PA. 77--85.]]
[111]
Sedgewick, R. and Flajolet, P. 1996. An Introduction to the Analysis of Algorithms. Addison-Wesley, Reading, MA.]]
[112]
Sim, J., Kim, D., Park, H., and Park, K. 2003. Linear-time search in suffix arrays. In Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms (AWOCA). 139--146.]]
[113]
Ukkonen, E. 1995. On-line construction of suffix trees. Algorithmica 14, 3, 249--260.]]
[114]
Weiner, P. 1973. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory. 1--11.]]
[115]
Witten, I., Moffat, A., and Bell, T. 1999. Managing Gigabytes, 2nd ed. Morgan Kaufmann, San Francisco, CA.]]
[116]
Ziv, J. and Lempel, A. 1978. Compression of individual sequences via variable length coding. IEEE Trans. Inform. Theor. 24, 5, 530--536.]]
[117]
Ziviani, N., Moura, E., Navarro, G., and Baeza-Yates, R. 2000. Compression: A key for next-generation text retrieval systems. IEEE Comput. 33, 11, 37--44.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 39, Issue 1
2007
148 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/1216370
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2007
Published in CSUR Volume 39, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Text indexing
  2. entropy
  3. text compression

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)15
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media