skip to main content
10.1145/3575693.3575715acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Revisiting Log-Structured Merging for KV Stores in Hybrid Memory Systems

Published: 30 January 2023 Publication History

Abstract

We present MioDB, a novel LSM-tree based key-value (KV) store system designed to fully exploit the advantages of byte-addressable non-volatile memories (NVMs). Our experimental studies reveal that the performance bottleneck of LSM-tree based KV stores using NVMs mainly stems from (1) costly data serialization/deserialization across memory and storage, and (2) unbalanced speed between memory-to-disk data flushing and on-disk data compaction. They may cause unpredictable performance degradation due to write stalls and write amplification. To address these problems, we advocate byte-addressable and persistent skip lists to replace the on-disk data structure of LSM-tree, and design four novel techniques to make the best use of fast NVMs. First, we propose one-piece flushing to minimize the cost of data serialization from DRAM to NVM. Second, we exploit an elastic NVM buffer with multiple levels and zero-copy compaction to eliminate write stalls and reduce write amplification. Third, we propose parallel compaction to orchestrate data flushing and compactions across all levels of LSM-trees. Finally, MioDB increases the depth of LSM-tree and exploits bloom filters to improve the read performance. Our extensive experimental studies demonstrate that MioDB achieves 17.1× and 21.7× lower 99.9th percentile latency, 8.3× and 2.5× higher random write throughput, and up to 5× and 4.9× lower write amplification compared with the state-of-the-art NoveLSM and MatrixKV, respectively.

References

[1]
Jung-Sang Ahn, Mohiuddin Abdul Qader, Woon-Hak Kang, Hieu Nguyen, Guogen Zhang, and Sami Ben-Romdhane. 2019. Jungle: Towards Dynamically Adjustable Key-value Store by Combining LSM-tree and Copy-on-write B+-tree. In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage).
[2]
Apache HBase. 2022. http://hbase.apache.org/
[3]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). 753–766.
[4]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems (TOCS), 26, 2 (2008), 1–26.
[5]
Shimin Chen and Qin Jin. 2015. Persistent B+-trees in Non-volatile Main Memory. Proceedings of the VLDB Endowment, 8, 7 (2015), 786–797.
[6]
Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In Proceedings of the 2020 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 1077–1091.
[7]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC). 143–154.
[8]
Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-time Trade-offs for LSM-tree based Key-value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD). 505–520.
[9]
Zhuohui Duan, Haikun Liu, Xiaofei Liao, Hai Jin, Wenbin Jiang, and Yu Zhang. 2019. HiNUMA: NUMA-aware Data Placement and Migration in Hybrid Memory Systems. In Proceedings of the 2019 IEEE 37th International Conference on Computer Design (ICCD). 367–375.
[10]
Zhuohui Duan, Haikun Liu, Haodi Lu, Xiaofei Liao, Hai Jin, Yu Zhang, and Bingsheng He. 2021. Gengar: An RDMA-based Distributed Hybrid Memory Pool. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). 92–103.
[11]
Zhuohui Duan, Haodi Lu, Haikun Liu, Xiaofei Liao, Hai Jin, Yu Zhang, and Song Wu. 2021. Hardware-supported Remote Persistence for Distributed Persistent Memory. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 1–14.
[12]
Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data Tiering in Heterogeneous Memory Systems. In Proceedings of the 9th ACM European Conference on Computer Systems (EuroSys). 15:1–15:16.
[13]
Facebook RocksDB. 2022. http://rocksdb.org/
[14]
FIO. 2022. https://github.com/axboe/fio
[15]
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling Concurrent Log-structured Data Stores. In Proceedings of the 10th European Conference on Computer Systems (EuroSys). 1–14.
[16]
Google LevelDB. 2022. https://github.com/google/leveldb
[17]
Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST). 187–200.
[18]
Intel Optane DIMMs. 2022. https://www.tomshardware.com/news/intel-optane-dimm-pricing-performance,39007.html
[19]
Hai Jin, Zhiwei Li, Haikun Liu, Xiaofei Liao, and Yu Zhang. 2020. Hotspot-Aware Hybrid Memory Management for In-Memory Key-Value Stores. IEEE Trans. Parallel Distributed Syst., 31, 4 (2020), 779–792.
[20]
Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H Noh, and Young-ri Choi. 2019. SLM-DB: Single-level Key-value Store with Persistent Memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST). 191–205.
[21]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for Nonvolatile Memory with NoveLSM. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC). 993–1005.
[22]
Wook-Hee Kim, R Madhava Krishnan, Xinwei Fu, Sanidhya Kashyap, and Changwoo Min. 2021. PACTree: A High Performance Persistent Range Index Using PAC Guidelines. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP). 424–439.
[23]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: the Design and Implementation of a Fast Persistent Key-value Store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 447–461.
[24]
Li J, Pavlo A, Dong S. 2022. Nvmrocks: Rocksdb on Non-volatile Memory Systems. http://istcbigdata.org/index.php/nvmrocks-rocksdb-on-non-volatilememory-systems/
[25]
Ruicheng Liu, Peiquan Jin, Xiaoliang Wang, Zhou Zhang, Shouhong Wan, and Bei Hua. 2019. NVLevel: A High Performance Key-Value Store for Non-Volatile Memory. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 1020–1027.
[26]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating Keys from Values in SSD-conscious Storage. ACM Transactions on Storage (TOS), 13, 1 (2017), 1–28.
[27]
MatrixKV. 2020. https://github.com/PDS-Lab/MatrixKV
[28]
mioDB. 2022. https://github.com/CGCL-codes/mioDB
[29]
mioDB artifact. 2022. https://doi.org/10.5281/zenodo.7423535
[30]
Moohyeon Nam, Hokeun Cha, Young-ri Choi, Sam H Noh, and Beomseok Nam. 2019. Write-Optimized Dynamic Hashing for Persistent Memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST). 31–44.
[31]
NoveLSM. 2019. https://github.com/SudarsunKannan/lsm_nvm
[32]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-tree for Storage Class Memory. In Proceedings of the 2016 International Conference on Management of Data. 371–386.
[33]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The Log-structured Merge-tree (LSM-tree). Acta Informatica, 33, 4 (1996), 351–385.
[34]
William Pugh. 1990. Skip Lists: A Probabilistic Alternative to Balanced Trees. Commun. ACM, 33, 6 (1990), 668–676.
[35]
Moinuddin K Qureshi, Vijayalakshmi Srinivasan, and Jude A Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. ACM SIGARCH Computer Architecture News, 37, 3 (2009), 24–33.
[36]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. Pebblesdb: Building Key-value Stores Using Fragmented Log-structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). 497–514.
[37]
Nae Young Song, Heon Young Yeom, and Hyuck Han. 2018. Efficient Key-value Stores with Ranged Log-structured Merge Trees. In Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). 652–659.
[38]
Xiaoyuan Wang, Haikun Liu, Xiaofei Liao, Ji Chen, Hai Jin, Yu Zhang, Long Zheng, Bingsheng He, and Song Jiang. 2019. Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems. ACM Trans. Archit. Code Optim., 16, 2 (2019), 11:1–11:26.
[39]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based Ultra-large Key-value Store for Small Data Items. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC). 71–82.
[40]
Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems. In Proceedings of the USENIX Annual Technical Conference (ATC). 349–362.
[41]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST). 169–182.
[42]
Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST). 167–181.
[43]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC). 17–31.
[44]
Chencheng Ye, Yuanchao Xu, Xipeng Shen, Xiaofei Liao, Hai Jin, and Yan Solihin. 2021. Hardware-based Address-Centric Acceleration of Key-Value Store. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). 736–748.
[45]
Chencheng Ye, Yuanchao Xu, Xipeng Shen, Xiaofei Liao, Hai Jin, and Yan Solihin. 2021. Supporting Legacy Libraries on Non-volatile Memory: a User-Transparent Approach. In Proceedings of the ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 443–455.
[46]
Qiang Zhang, Yongkun Li, Patrick PC Lee, Yinlong Xu, Qiu Cui, and Liu Tang. 2020. UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads via Unified Indexing. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE). 313–324.
[47]
Wenhui Zhang, Xingsheng Zhao, Song Jiang, and Hong Jiang. 2021. ChameleonDB: A Key-value Store for Optane Persistent Memory. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys). 194–209.
[48]
Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 461–476.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
January 2023
947 pages
ISBN:9781450399166
DOI:10.1145/3575693
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Key-Value Store
  2. LSM-tree Compaction
  3. Log-Structured Merge
  4. Non-Volatile Memory
  5. Skip List

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China (NSFC)

Conference

ASPLOS '23

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)988
  • Downloads (Last 6 weeks)22
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media