skip to main content
10.1145/2911451.2914802acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Succinct Data Structures in Information Retrieval: Theory and Practice

Published: 07 July 2016 Publication History

Abstract

Succinct data structures are used today in many information retrieval applications, e.g., posting lists representation, language model representation, indexing (social) graphs, query auto-completion, document retrieval and indexing dictionary of strings, just to mention the most recent ones. These new kind of data structures mimic the operations of their classical counterparts within a comparable time complexity but require much less space. With the availability of several libraries for basic succinct structures - like SDSL, Succinct, Facebook?s Folly, and Sux - it is relatively easy to directly profit from advances in this field. In this tutorial we will introduce this field of research by presenting the most important succinct data structures to represent set of integers, set of points, trees, graphs and strings together with their most important applications to Information Retrieval problems. The introduction of the succinct data structures will be sustained with a practical session with programming handouts to solve. This will allow the attendees to directly experiment with implementations of these solutions on real datasets and understand the potential benefits they can bring on their own projects.

References

[1]
J. Barbay. Succinct and compressed data structures for permutations and integer functions. In Encyclopedia of Algorithms. 2015.
[2]
J. Barbay and J. I. Munro. Succinct encoding of permutations: Applications to text indexing. In Encyclopedia of Algorithms. 2008.
[3]
D. Benoit, E. Demaine, J. I. Munro, R. Raman, V.Raman, and S. S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275--292, 2005.
[4]
P. Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM, 21:246--260, 1974.
[5]
R. M. Fano. On the number of bits required to implement anassociative memory. Memorandum 61, Computer Structures Group, Project MAC, 1971.
[6]
P. Ferragina, R. González, G. Navarro, and R. Venturini. Compressed text indexes: From theory to practice. ACM Journal of Experimental Algorithmics, 13, 2008.
[7]
P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552--581, 2005.
[8]
P. Ferragina, F. Piccinno, and R. Venturini. Compressed indexes for string-searching in labeled graphs. In Proceedings of the 24th International Conference on World Wide Web (WWW), pages --, 2015.
[9]
P. Ferragina and S. S. Rao. Tree compression and indexing. In Encyclopedia of Algorithms. 2008.
[10]
P. Ferragina and R. Venturini. Indexing compressed text. In Encyclopedia of Database Systems, pages 1442--1448. 2009.
[11]
P. Ferragina and R. Venturini. The compressed permuterm index. ACM Transactions on Algorithms, 7(1):10, 2010.
[12]
J. Fischer and V. Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing, 40(2):465--492, 2011.
[13]
L. Foschini, R. Grossi, A. Gupta, and J. S. Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Transactions on Algorithms, 2(4):611--639, 2006.
[14]
S. Gog, T. Beller, A. Moffat, and M. Petri. From theory to practice: Plug and play with succinct data structures. In Proceedings of the 13th International Symposium Experimental Algorithms (SEA), pages 326--337, 2014.
[15]
S. Gog and G. Navarro. Improved single-term top-phk document retrieval. In Proceedings of the Seventeenth Workshop on Algorithm Engineering and Experiments, ALENEX, pages 24--32, 2015.
[16]
S. Gog and M. Petri. Compact indexes for flexible top-k retrieval. In Combinatorial Pattern Matching - 26th Annual Symposium, CPM, pages 207--218, 2015.
[17]
R. Grossi, A. Gupta, and J. S. Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841--850, 2003.
[18]
R. Grossi and G. Ottaviano. Fast compressed tries through path decompositions. ACM Journal of Experimental Algorithmics, 19(1), 2014.
[19]
W. Hon, R. Shah, and J. S. Vitter. Space-efficient framework for top-k string retrieval problems. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 713--722, 2009.
[20]
W. Hon, R. Shah, and J. S. Vitter. Space-efficient framework for top-k string retrieval problems. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 713--722, 2009.
[21]
B. P. Hsu and G. Ottaviano. Space-efficient data structures for top-phk completion. In Proceedings of the 22nd International World Wide Web Conference (WWW), pages 583--594, 2013.
[22]
R. Konow, G. Navarro, C. L. A. Clarke, and A. López-Ortiz. Faster and smaller inverted indices with treaps. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 193--202, 2013.
[23]
G. Navarro. Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. ACM Computing Surveys, 46(4):article 52, 2014. 47 pages.
[24]
G. Navarro. Wavelet trees for all. Journal Discrete Algorithms, 25:2--20, 2014.
[25]
G. Navarro and V. Mäkinen. Compressed full text indexes. ACM Computing Surveys, 39(1), 2007.
[26]
G. Navarro and Y. Nekrich. Top-phk document retrieval in optimal time and linear space. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1066--1077, 2012.
[27]
D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. In Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX, 2007.
[28]
G. Ottaviano, N. Tonellotto, and R. Venturini. Optimal space-time tradeoffs for inverted indexes. In Proceedings of the 8th Annual International ACM Conference on Web Search and Data Mining (WSDM), pages --, 2015.
[29]
G. Ottaviano and R. Venturini. Partitioned elias-fano indexes. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 273--282, 2014.
[30]
M. Patil, S. V. Thankachan, R. Shah, W. Hon, J. S. Vitter, and S. Chandrasekaran. Inverted indexes for phrases and strings. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pages 555--564, 2011.
[31]
N. Rahman and R. Raman. Rank and select operations on binary strings. In Encyclopedia of Algorithms. 2008.
[32]
K. Sadakane. Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms, 5(1):12--22, 2007.
[33]
K. Sadakane and G. Navarro. Fully-functional succinct trees. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 134--149, 2010.
[34]
E. Shareghi, M. Petri, G. Haffari, and T. Cohn. Compact, efficient and unlimited capacity: Language modeling with compressed suffix trees. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 2409--2418, 2015.
[35]
S. Vigna. Quasi-succinct indices. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM), pages 83--92, 2013.
[36]
S. Yata. Marisa trie, https://code.google.com/archive/p/marisa-trie/. 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data compression
  2. indexing
  3. succinct data structures

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media