research-article

Put an elephant into a fridge: optimizing cache efficiency for in-memory key-value stores

Authors:

Feng ChenAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 13, Issue 9

Pages 1540 - 1554

https://doi.org/10.14778/3397230.3397247

Published: 01 May 2020 Publication History

Abstract

In today's data centers, memory-based key-value systems, such as Memcached and Redis, play an indispensable role in providing high-speed data services. The rapidly growing capacity and quickly falling price of DRAM memory in the past years have enabled us to create a large memory-based key-value store, which is able to serve hundreds of Gigabytes to even Terabytes of key-value data all in memory. Unfortunately, CPU cache in modern processors has not seen a similar growth in capacity, still remaining at the level of a few dozens of Megabytes. Such an extremely low cache-to-memory ratio (less than 0.1%) poses a significant new challenge---the limited CPU cache is becoming a severe performance bottleneck that hinders us from fully exploiting the great potential of high-speed memory-based key-value stores.

To address this critical challenge, we propose a highly cache-efficient scheme, called Cavast, to optimize the cache utilization of large-capacity in-memory key-value stores. Our goal is to maximize cache efficiency and system performance without any hardware changes. We first present two light-weight, software-only mechanisms to enable user to indirectly control the cache content at application level. Then we propose a set of optimization policies to address several critical design issues that impair cache's efficacy in the current key-value store systems. By carefully reorganizing the data layout in memory, redesigning the hash indexing structure, and offloading garbage collection, we can effectively improve the utilization of the limited cache space. We have developed a module in Linux as a kernel-level support, and implemented two prototypes based on Memcached and Redis with the proposed Cavast scheme. Our experimental studies show promising results. On a 6-core Intel Xeon processor with only 15-MB cache, we can raise the cache hit ratio up to 82.7% with a very small cache-to-memory ratio (0.023%), and significantly increase the key-value system throughput by a factor of up to 4.2.

References

[1]

CAS latency. https://en.wikipedia.org/wiki/CAS_latency.

[2]

Generalized extreme value distribution. https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution.

[3]

Generalized Pareto distribution. https://en.wikipedia.org/wiki/Generalized_Pareto_distribution.

[4]

Hardware performance counter. https://en.wikipedia.org/wiki/Hardware_performance_counter.

[5]

Intel Skylake. https://www.7-cpu.com/cpu/Skylake.html.

[6]

Intel Xeon Platinum 9282. https://ark.intel.com/content/www/us/en/ark/products/194146/intel-xeon-platinum-9282-processor-77m-cache-2-60-ghz.html.

[7]

Jenkins Hash. https://en.wikipedia.org/wiki/Jenkins_hash_function.

[8]

Linux hugepage. https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt.

[9]

Linux Perf. https://en.wikipedia.org/wiki/Perf_(Linux).

[10]

Memcached. https://memcached.org.

[11]

MurmurHash3. https://github.com/aappleby/smhasher/wiki/MurmurHash3.

[12]

Random-access memory. https://en.wikipedia.org/wiki/Random-access_memory#Timeline.

[13]

Redis. https://redis.io.

[14]

Redis-based applications. https://techstacks.io/tech/redis.

[15]

Scaling memcached at Facebook. https://www.facebook.com/notes/facebook-engineering/scaling-memcached-at-facebook/39391378919/.

[16]

Synchronous dynamic random-access memory (SDRAM). https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory.

[17]

The 10% rule for VSAN caching, calculate it on a VM basis not disk capacity! http://www.yellow-bricks.com/2016/02/16/10-rule-vsan-caching-calculate-vm-basis-not-disk-capacity/.

[18]

Twemcache. https://github.com/twitter/twemcache.

[19]

A. Adya, R. Grandl, D. Myers, and H. Qin. Fast key-value stores: An idea whose time has come and gone. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS '19), pages 113--119, 2019.

Digital Library

[20]

D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), pages 1--14, 2009.

Digital Library

[21]

N. Askitis and R. Sinha. HAT-trie: A cache-conscious trie-based data structure for strings. In Proceedings of the 30th Australasian Conference on Computer Science, pages 97--105, 2007.

[22]

N. Askitis and R. Sinha. Engineering scalable, cache and space efficient tries for strings. The VLDB Journal, 19(5):633--660, 2010.

Digital Library

[23]

N. Askitis and J. Zobel. Redesigning the string hash table, burst trie, and BST to exploit cache. Journal of Experimental Algorithmics (JEA), 15(1):1--61, 2011.

[24]

B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of 2012 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12), volume 40, pages 53--64, 2012.

Digital Library

[25]

L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of IEEE Conference on Computer Communications (INFOCOM '99), volume 1, pages 126--134, 1999.

[26]

F. Chen, T. Luo, and X. Zhang. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST '11), San Jose, CA, Feb 15-17 2011.

[27]

T. M. Chilimbi, M. D. Hill, and J. R. Larus. Making pointer-based data structures cache conscious. Computer, 33(12):67--74, 2000.

Digital Library

[28]

A. Cidon, D. Rushton, S. M. Rumble, and R. Stutsman. Memshare: A dynamic multi-tenant key-value cache. In Proceedings of 2017 USENIX Annual Technical Conference (USENIX ATC '17), pages 321--334, 2017.

[29]

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10), pages 143--154, 2010.

Digital Library

[30]

C. R. Cunha, A. Bestavros, and M. E. Crovella. Characteristics of WWW client-based traces. Technical report, Boston University Computer Science Department, 1995.

[31]

B. Debnath, S. Sengupta, and J. Li. FlashStore: High throughput persistent key-value store. PVLDB, 3(2):1414--1425, 2010.

Digital Library

[32]

B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11), pages 25--36, 2011.

Digital Library

[33]

B. Fan, D. G. Andersen, and M. Kaminsky. MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI '13), pages 371--384, 2013.

[34]

A. Farshin, A. Roozbeh, G. Q. Maguire Jr, and D. Kostić. Make the most out of last level cache in Intel processors. In Proceedings of the Fourteenth EuroSys Conference (EuroSys '19), pages 1--17, 2019.

Digital Library

[35]

R. A. Hankins and J. M. Patel. Effect of node size on the performance of cache-conscious B+-trees. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of computer systems (SIGMETRICS '03), pages 283--294, 2003.

Digital Library

[36]

S. Heinz, J. Zobel, and H. E. Williams. Burst tries: A fast, efficient data structure for string keys. ACM Transactions on Information Systems (TOIS), 20(2):192--223, 2002.

[37]

M. Herlihy, N. Shavit, and M. Tzafrir. Hopscotch hashing. In Proceedings of International Symposium on Distributed Computing (DISC '08), pages 350--364, 2008.

Digital Library

[38]

X. Hu, X. Wang, Y. Li, L. Zhou, Y. Luo, C. Ding, and Z. Wang. LAMA: Optimized locality-aware memory allocation for key-value cache. In Proceedings of 2015 USENIX Annual Technical Conference (USENIX ATC '15), pages 57--69, 2015.

[39]

R. Hund, C. Willems, and T. Holz. Practical timing side channel attacks against kernel space ASLR. In Proceedings of 2013 IEEE Symposium on Security and Privacy, pages 191--205, 2013.

Digital Library

[40]

S. Jiang, F. Chen, and X. Zhang. CLOCK-Pro: An effective improvement of the CLOCK replacement. In Proceedings of 2005 USENIX Annual Technical Conference (USENIX ATC '05), pages 323--336, 2005.

[41]

R. Kelly, B. A. Pearlmutter, and P. Maguire. Lock-free hopscotch hashing. In arXiv preprint arXiv.1911.03028, 2019.

[42]

C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, and et al. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10), pages 339--350, 2010.

Digital Library

[43]

M. C. Lee, F. Y. Leu, and Y. P. Chen. Pareto-based cache replacement for YouTube. In World Wide Web, pages 1523--1540, 2015.

[44]

D. Levinthal. Performance analysis guide for Intel Core i7 processor and Intel Xeon 5500 processors. https://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf.

[45]

H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), pages 1--13, 2011.

Digital Library

[46]

H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI '14), pages 429--444, 2014.

[47]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of 14th IEEE International Symposium on High Performance Computer Architecture (HPCA '08), pages 367--378, 2008.

[48]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Enabling software management for multicore caches with a lightweight hardware support. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09), page 14, 2009.

Digital Library

[49]

G. Lu, Y. J. Nam, and D. H. Du. BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. In Proceedings of 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST '12), pages 1--11, 2012.

[50]

Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: Improving hardware cache performance through software-controlled object-level partitioning. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT '09), pages 246--257, 2009.

Digital Library

[51]

L. Marmol, S. Sundararaman, N. Talagala, and R. Rangaswami. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of 2015 USENIX Annual Technical Conference (USENIX ATC '15), pages 207--219, 2015.

[52]

C. Maurice, N. Le Scouarnec, C. Neumann, O. Heen, and A. Francillon. Reverse engineering Intel last-level cache complex addressing using performance counters. In International Symposium on Recent Advances in Intrusion Detection, pages 48--65, 2015.

Digital Library

[53]

J. C. McCallum. Memory prices 1957+. https://jcmit.net/memoryprice.htm.

[54]

Z. Metreveli, N. Zeldovich, and M. F. Kaashoek. CPHash: A cache-partitioned hash table. ACM SIGPLAN Notices, 47(8):319--320, 2012.

Digital Library

[55]

F. Ni, S. Jiang, H. Jiang, J. Huang, and X. Wu. SDC: A software defined cache for efficient data indexing. In Proceedings of the ACM International Conference on Supercomputing (ICS '19), pages 82--93, 2019.

Digital Library

[56]

S. Noll, J. Teubner, N. May, and A. Böhm. Accelerating concurrent workloads with CPU cache partitioning. In Proceedings of 2018 IEEE 34th International Conference on Data Engineering (ICDE '18), pages 437--448, 2018.

[57]

C. Pan, L. Zhou, Y. Luo, X. Wang, and Z. Wang. Lightweight and accurate memory allocation in key-value cache. International Journal of Parallel Programming, 47(3):451--466, 2019.

Digital Library

[58]

G. Psaropoulos, T. Legler, N. May, and A. Ailamaki. Interleaving with coroutines: A practical approach for robust index joins. PVLDB, 11(2):230--242, 2017.

Digital Library

[59]

J. Rao and K. A. Ross. Making B+-trees cache conscious in main memory. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '00), pages 475--486, 2000.

Digital Library

[60]

D. Reinsel, J. Gantz, and J. Rydning. Data age 2025: The digitization of the world from edge to core. IDC White Paper, 2018.

[61]

S. M. Rumble, A. Kejriwal, and J. Ousterhout. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST '14), pages 1--16, 2014.

Digital Library

[62]

Z. Shen, F. Chen, Y. Jia, and Z. Shao. DIDACache: A deep integration of device and application for flash based key-value caching. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST '17), pages 391--405, 2017.

[63]

K. Wang and F. Chen. Cascade mapping: Optimizing memory efficiency for flash-based key-value caching. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '18), pages 464--476, 2018.

Digital Library

[64]

X. Wu, L. Zhang, Y. Wang, Y. Ren, M. Hack, and S. Jiang. zExpander: A key-value cache with both high performance and fewer misses. In Proceedings of the Eleventh European Conference on Computer Systems (Eurosys '16), pages 1--15, 2016.

Digital Library

[65]

L. Xu, A. Pavlo, S. Sengupta, and G. R. Ganger. Online deduplication for databases. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17), page 1355--1368, 2017.

Digital Library

[66]

S. Xu, S. Lee, S. W. Jun, M. Liu, and J. Hicks. BlueCache: A scalable distributed flash-based key-value store. PVLDB, 10(4):301--312, 2016.

Digital Library

[67]

Y. Yarom, Q. Ge, F. Liu, R. B. Lee, and G. Heiser. Mapping the Intel last-level cache. Cryptology ePrint Archive, Report 2015/905, 2015.

[68]

G. Zhang and D. Sanchez. Leveraging caches to accelerate hash tables and memoization. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '19), pages 440--452, 2019.

Digital Library

[69]

H. Zhang, M. Dong, and H. Chen. Efficient and available in-memory KV-store with hybrid erasure coding and replication. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST '16), pages 167--180, 2016.

Digital Library

[70]

K. Zhang, K. Wang, Y. Yuan, L. Guo, R. Lee, and X. Zhang. Mega-KV: A case for GPUs to maximize the throughput of in-memory key-value stores. PVLDB, 8(11):1226--1237, 2015.

Digital Library

[71]

X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09), pages 89--102, 2009.

Digital Library

[72]

P. Zuo and Y. Hua. A write-friendly and cache-optimized hashing scheme for non-volatile memory systems. IEEE Transactions on Parallel and Distributed Systems, 29(5):985--998, 2017.

Cited By

Yao YWang XZhou DLi LWu JZhu LWang ZLuo Y(2024)EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for RedisEuro-Par 2024: Parallel Processing10.1007/978-3-031-69577-3_12(166-179)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-69577-3_12
Kühn RBiebert DHakert CChen JTeubner J(2023)Towards Data-Based Cache Optimization of B+-TreesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595316(63-69)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595316
Hu YJiang QWang CTakahashi A(2023)Exploring Architectural Implications to Boost Performance for in-NVM B+-TreeProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3567861(116-121)Online publication date: 16-Jan-2023
https://dl.acm.org/doi/10.1145/3566097.3567861

Recommendations

Elephant: The File System that Never Forgets
HOTOS '99: Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems

Modern file systems associate the deletion of a file with the release of the storage associated with that file, and file writes with the irrevocable change of file contents. We propose that this model of file system behavior is a relic of the past, when ...
Deciding when to forget in the Elephant file system
SOSP '99: Proceedings of the seventeenth ACM symposium on Operating systems principles

Modern file systems associate the deletion of a file with the immediate release of storage, and file writes with the irrevocable change of file contents. We argue that this behavior is a relic of the past, when disk storage was a scarce resource. Today, ...
Deciding when to forget in the Elephant file system

Modern file systems associate the deletion of a file with the immediate release of storage, and file writes with the irrevocable change of file contents. We argue that this behavior is a relic of the past, when disk storage was a scarce resource. Today, ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 13, Issue 9

May 2020

295 pages

ISSN:2150-8097

Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2020

Published in PVLDB Volume 13, Issue 9

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
276
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao YWang XZhou DLi LWu JZhu LWang ZLuo Y(2024)EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for RedisEuro-Par 2024: Parallel Processing10.1007/978-3-031-69577-3_12(166-179)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-69577-3_12
Kühn RBiebert DHakert CChen JTeubner J(2023)Towards Data-Based Cache Optimization of B+-TreesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595316(63-69)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3592980.3595316
Hu YJiang QWang CTakahashi A(2023)Exploring Architectural Implications to Boost Performance for in-NVM B+-TreeProceedings of the 28th Asia and South Pacific Design Automation Conference10.1145/3566097.3567861(116-121)Online publication date: 16-Jan-2023
https://dl.acm.org/doi/10.1145/3566097.3567861
Zhuo DZhang KLi ZZhuang SWang SChen AStoica I(2022)Rearchitecting in-memory object stores for low latencyProceedings of the VLDB Endowment10.14778/3494124.349413815:3(555-568)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.14778/3494124.3494138

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents