skip to main content
10.1145/3533702.3534912acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper
Open access

LSI: a learned secondary index structure

Published: 11 August 2022 Publication History

Abstract

Learned index structures have been shown to achieve favorable lookup performance and space consumption compared to their traditional counterparts such as B-trees. However, most learned index studies have focused on the primary indexing setting, where the base data is sorted. In this work, we investigate whether learned indexes sustain their advantage in the secondary indexing setting. We introduce Learned Secondary Index (LSI), a first attempt to use learned indexes for indexing unsorted data. LSI works by building a learned index over a permutation vector, which allows binary search to performed on the unsorted base data using random access. We additionally augment LSI with a fingerprint vector to accelerate equality lookups. We show that LSI achieves comparable lookup performance to state-of-the-art secondary indexes while being up to 6x more space efficient.

References

[1]
RobinMap, https://github.com/Tessil/robin-map.
[2]
STX B+ Tree, https://panthema.net/2007/stx-btree/.
[3]
H. Abu-Libdeh, D. Altinbüken, A. Beutel, E. H. Chi, L. Doshi, T. Kraska, X. Li, A. Ly, and C. Olston. Learned indexes for a google-scale disk-based database. CoRR, abs/2012.12501, 2020.
[4]
B. Cohen and D. Boneh. How to Store a Permutation Compactly, https://hackmd.io/@dabo/rkP8Pcf9t.
[5]
A. Crotty. Hist-tree: Those who ignore it are doomed to learn. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org, 2021.
[6]
Y. Dai, Y. Xu, A. Ganesan, R. Alagappan, B. Kroth, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. From WiscKey to Bourbon: A learned index for log-structured merge trees. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI2020, Virtual Event, November 4-6, 2020, pages 155--171. USENIX Association, 2020.
[7]
J. Ding, U. F. Minhas, J. Yu, C. Wang, J. Do, Y. Li, H. Zhang, B. Chandramouli, J. Gehrke, D. Kossmann, D. B. Lomet, and T. Kraska. ALEX: an updatable adaptive learned index. In D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, and H. Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 969--984. ACM, 2020.
[8]
P. Ferragina and G. Vinciguerra. Learned data structures. In Recent Trends in Learning From Data, volume 896 of Studies in Computational Intelligence. Springer, 2020.
[9]
P. Ferragina and G. Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow., 13(8):1162--1175, 2020.
[10]
A. Hadian and T. Heinis. Shift-Table: A low-latency learned index for range queries using model correction. In Y. Velegrakis, D. Zeinalipour-Yazti, P. K. Chrysanthis, and F. Guerra, editors, Proceedings of the 24th International Conference on Extending Database Technology, EDBT2021, Nicosia, Cyprus, March 23 - 26, 2021, pages 253--264. OpenProceedings.org, 2021.
[11]
C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey, V. W. Lee, S. A. Brandt, and P. Dubey. FAST: fast architecture sensitive tree search on modern cpus and gpus. In A. K. Elmagarmid and D. Agrawal, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 339--350. ACM, 2010.
[12]
A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. SOSD: A benchmark for learned indexes. NeurIPS Workshop on Machine Learning for Systems, 2019.
[13]
A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. RadixSpline: a single-pass learned index. In R. Bordawekar, O. Shmueli, N. Tatbul, and T. K. Ho, editors, Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2020, Portland, Oregon, USA, June 19, 2020, pages 5:1--5:5. ACM, 2020.
[14]
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The case for learned index structures. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 489--504. ACM, 2018.
[15]
D. H. Lehmer. Teaching combinatorial tricks to a computer. In Proc. Sympos. Appl. Math. Combinatorial Analysis, volume 10, pages 179--193, 1960.
[16]
V. Leis, A. Kemper, and T. Neumann. The Adaptive Radix Tree: ARTful indexing for main-memory databases. In C. S. Jensen, C. M. Jermaine, and X. Zhou, editors, 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pages 38--49. IEEE Computer Society, 2013.
[17]
M. Maltry and J. Dittrich. A critical analysis of recursive model indexes. CoRR, abs/2106.16166, 2021.
[18]
R. Marcus, A. Kipf, A. van Renen, M. Stoian, S. Misra, A. Kemper, T. Neumann, and T. Kraska. Benchmarking learned indexes. Proc. VLDB Endow., 14(1):1--13, 2020.
[19]
R. Marcus, E. Zhang, and T. Kraska. CDFShop: Exploring and optimizing learned index structures. In D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, and H. Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 2789--2792. ACM, 2020.
[20]
M. Mishra and R. Singhal. RUSLI: real-time updatable spline learned index. In R. Bordawekar, Y. Amsterdamer, O. Shmueli, and N. Tatbul, editors, aiDM '21: Fourth Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021, pages 1--8. ACM, 2021.
[21]
V. Nathan, J. Ding, T. Kraska, and M. Alizadeh. Cortex: Harnessing correlations to boost query performance. CoRR, abs/2012.06683, 2020.
[22]
V. Pandey, A. van Renen, A. Kipf, J. Ding, I. Sabek, and A. Kemper. The case for learned spatial indexes. 2nd International Workshop on Applied AI for Database Systems and Applications, 2020.
[23]
N. F. Setiawan, B. I. P. Rubinstein, and R. Borovica-Gajic. Function interpolation for learned index structures. In R. Borovica-Gajic, J. Qi, and W. Wang, editors, Databases Theory and Applications - 31st Australasian Database Conference, ADC 2020, Melbourne, VIC, Australia, February 3-7, 2020, Proceedings, volume 12008 of Lecture Notes in Computer Science, pages 68--80. Springer, 2020.
[24]
B. Spector, A. Kipf, K. Vaidya, C. Wang, U. F. Minhas, and T. Kraska. Bounding the last mile: Efficient learned string indexing. 3rd International Workshop on Applied AI for Database Systems and Applications, 2021.
[25]
M. Stoian, A. Kipf, R. Marcus, and T. Kraska. PLEX: towards practical learned indexing. 3rd International Workshop on Applied AI for Database Systems and Applications, 2021.
[26]
J. Wu, Y. Zhang, S. Chen, Y. Chen, J. Wang, and C. Xing. Updatable learned index with precise positions. Proc. VLDB Endow., 14(8):1276--1288, 2021.
[27]
Y. Wu, J. Yu, Y. Tian, R. Sidle, and R. Barber. Designing succinct secondary indexing mechanism by exploiting column correlations. In P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 1223--1240. ACM, 2019.
[28]
E. T. Zacharatou, A. Kipf, I. Sabek, V. Pandey, H. Doraiswamy, and V. Markl. The case for distance-bounded spatial approximations. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org, 2021.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
aiDM '22: Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
June 2022
53 pages
ISBN:9781450393775
DOI:10.1145/3533702
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper

Funding Sources

  • NSF IIS

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 19 of 26 submissions, 73%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)220
  • Downloads (Last 6 weeks)28
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media