skip to main content
10.1145/3582016.3582063acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory

Published: 25 March 2023 Publication History

Abstract

The increasing demand for memory in hyperscale applications has led to memory becoming a large portion of the overall datacenter spend. The emergence of coherent interfaces like CXL enables main memory expansion and offers an efficient solution to this problem. In such systems, the main memory can constitute different memory technologies with varied characteristics. In this paper, we characterize memory usage patterns of a wide range of datacenter applications across the server fleet of Meta. We, therefore, demonstrate the opportunities to offload colder pages to slower memory tiers for these applications. Without efficient memory management, however, such systems can significantly degrade performance.
We propose a novel OS-level application-transparent page placement mechanism (TPP) for CXL-enabled memory. TPP employs a lightweight mechanism to identify and place hot/cold pages to appropriate memory tiers. It enables a proactive page demotion from local memory to CXL-Memory. This technique ensures a memory headroom for new page allocations that are often related to request processing and tend to be short-lived and hot. At the same time, TPP can promptly promote performance-critical hot pages trapped in the slow CXL-Memory to the fast local memory, while minimizing both sampling overhead and unnecessary migrations. TPP works transparently without any application-specific knowledge and can be deployed globally as a kernel release.
We evaluate TPP with diverse memory-sensitive workloads in the production server fleet with early samples of new x86 CPUs with CXL 1.1 support. TPP makes a tiered memory system performant as an ideal baseline (<1% gap) that has all the memory in the local tier. It is 18% better than today’s Linux, and 5–17% better than existing solutions including NUMA Balancing and AutoTiering. Most of the TPP patches have been merged in the Linux v5.18 release while the remaining ones are just pending for more discussion.

References

[1]
A Milestone in Moving Data. https://newsroom.intel.com/editorials/milestone-moving-data/.
[2]
Alibaba Cluster Trace 2018. https://github.com/alibaba/clusterdata/blob/master/cluster-trace-v2018/trace_2018.md.
[3]
AMD Infinity Architecture. https://www.amd.com/en/technologies/infinity-architecture.
[4]
AMD Joins Consortia to Advance CXL. https://community.amd.com/t5/amd-business-blog/amd-joins-consortia-to-advance-cxl-a-new-high-speed-interconnect/ba-p/418202.
[5]
Baidu feed stream services restructures its in-memory database with intel optane technology. https://www.intel.com/content/www/us/en/customer-spotlight/stories/baidu-feed-stream-case-study.html.
[6]
CCIX. https://www.ccixconsortium.com/.
[7]
Compute Express Link (CXL). https://www.computeexpresslink.org/.
[8]
Creating in-memory RAM disks. https://cloud.google.com/compute/docs/disks/mount-ram-disks.
[9]
CXL and the Tiered-Memory Future of Servers. https://www.lenovoxperience.com/newsDetail/283yi044hzgcdv7snkrmmx9ovpq6aesmy9u9k7ai2648j7or.
[10]
CXL Roadmap Opens Up the Memory Hierarchy. https://www.nextplatform.com/2021/09/07/the-cxl-roadmap-opens-up-the-memory-hierarchy/.
[11]
DAMON: Data Access MONitoring Framework for Fun and Memory Management Optimizations. https://www.linuxplumbersconf.org/event/7/contributions/659/attachments/503/1195/damon_ksummit_2020.pdf.
[12]
Facebook and Amazon are causing a memory shortage. https://www.networkworld.com/article/3247775/facebook-and-amazon-are-causing-a-memory-shortage.html.
[13]
Frontswap. https://www.kernel.org/doc/html/latest/vm/frontswap.html.
[14]
Gen-Z. https://genzconsortium.org/.
[15]
Google Cluster Trace 2019. https://github.com/google/cluster-data/blob/master/ClusterData2019.md.
[16]
Idle Memory Tracking. https://www.kernel.org/doc/Documentation/vm/idle_page_tracking.txt.
[17]
Idle page tracking-based working set estimation. https://lwn.net/Articles/460762/.
[18]
Intel® Xeon® Processor Scalable Family Technical Overview. https://www.intel.com/content/www/us/en/developer/articles/technical/xeon-processor-scalable-family-technical-overview.html.
[19]
Introducing new product innovations for SAP HANA, Expanded AI collaboration with SAP and more. https://azure.microsoft.com/en-us/blog/introducing-new-product-innovations-for-sap-hana-expanded-ai-collaboration-with-sap-and-more/.
[20]
Memkind. https://memkind.github.io/memkind/.
[21]
Micron Exits 3DXPoint, Eyes CXL Opportunities. https://www.eetimes.com/micron-exits-3d-xpoint-market-eyes-cxl-opportunities.
[22]
NUMA Balancing (AutoNUMA). https://mirrors.edge.kernel.org/pub/linux/kernel/people/andrea/autonuma/autonuma_bench-20120530.pdf.
[23]
OpenCAPI. https://opencapi.org/.
[24]
Reimagining Memory Expansion for Single Socket Servers with CXL. https://www.computeexpresslink.org/post/cxl-consortium-upcoming-industry-events.
[25]
Samsung Unveils Industry-First Memory Module Incorporating New CXL Interconnect Standard. https://news.samsung.com/global/samsung-unveils-industry-first-memory-module-incorporating-new-cxl-interconnect-standard.
[26]
The zswap compressed swap cache. https://lwn.net/Articles/537422/.
[27]
Tmpfs. https://www.kernel.org/doc/html/latest/filesystems/tmpfs.html.
[28]
Top-tier memory management. https://lwn.net/Articles/857133/.
[29]
Using DAMON for proactive reclaim. https://lwn.net/Articles/863753/.
[30]
zram. https://www.kernel.org/doc/Documentation/blockdev/zram.txt.
[31]
N. Agarwal and T. F. Wenisch. Thermostat: Application-transparent page management for two-tiered main memory. SIGPLAN, 2017.
[32]
M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, S. Novaković, A. Ramanathan, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei. Remote regions: a simple abstraction for remote memory. In USENIX ATC, 2018.
[33]
E. Amaro, C. Branner-Augmon, Z. Luo, A. Ousterhout, M. K. Aguilera, A. Panda, S. Ratnasamy, and S. Shenker. Can far memory improve job throughput? In EuroSys, 2020.
[34]
I. Calciu, M. T. Imran, I. Puddu, S. Kashyap, H. A. Maruf, O. Mutlu, and A. Kolli. Rethinking software runtimes for disaggregated memory. In ASPLOS, 2021.
[35]
Y. Chen, I. B. Peng, Z. Peng, X. Liu, and B. Ren. ATMem: Adaptive data placement in graph applications on heterogeneous memories. In CGO, 2020.
[36]
T. D. Doudali, S. Blagodurov, A. Vishnu, S. Gurumurthi, and A. Gavrilovska. Kleio: A hybrid memory page scheduler with machine intelligence. In HPDC, 2019.
[37]
J. Du and Y. Li. Elastify cloud-native spark application with PMEM. Persistent Memory Summit, 2019.
[38]
S. R. Dulloor, A. Roy, Z. Zhao, N. Sundaram, N. Satish, R. Sankaran, J. Jackson, and K. Schwan. Data tiering in heterogeneous memory systems. In EuroSys, 2016.
[39]
A. Eisenman, D. Gardner, I. AbdelRahman, J. Axboe, S. Dong, K. Hazelwood, C. Petersen, A. Cidon, and S. Katti. Reducing DRAM footprint with NVM in Facebook. In EuroSys, 2018.
[40]
Y. Gao, Q. Li, L. Tang, Y. Xi, P. Zhang, W. Peng, B. Li, Y. Wu, S. Liu, L. Yan, F. Feng, Y. Zhuang, F. Liu, P. Liu, X. Liu, Z. Wu, J. Wu, Z. Cao, C. Tian, J. Wu, J. Zhu, H. Wang, D. Cai, and J. Wu. When cloud storage meets RDMA. In NSDI, 2021.
[41]
D. Gouk, S. Lee, M. Kwon, and M. Jung. Direct access, High-Performance memory disaggregation with DirectCXL. In USENIX ATC, 2022.
[42]
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with Infiniswap. In NSDI, 2017.
[43]
D. Hansen. Migrate pages in lieu of discard. https://lwn.net/Articles/860215/.
[44]
B. Holden, D. Anderson, J. Trodden, and M. Daves. HyperTransport 3.1 Interconnect Technology. 2008.
[45]
S. Kannan, A. Gavrilovska, V. Gupta, and K. Schwan. Heteroos: Os design for heterogeneous memory management in datacenter. In ISCA, 2017.
[46]
H. T. Kassa, J. Akers, M. Ghosh, Z. Cao, V. Gogte, and R. Dreslinski. Improving performance of flash based Key-Value stores using storage class memory as a volatile memory extension. In USENIX ATC, 2021.
[47]
J. Kim, W. Choe, and J. Ahn. Exploring the design space of page management for Multi-Tiered memory systems. In USENIX ATC, 2021.
[48]
A. Lagar-Cavilla, J. Ahn, S. Souhlal, N. Agarwal, R. Burny, S. Butt, J. Chang, A. Chaugule, N. Deng, J. Shahid, G. Thelen, K. A. Yurtsever, Y. Zhao, and P. Ranganathan. Software-defined far memory in warehouse-scale computers. In ASPLOS, 2019.
[49]
S.-H. Lee. Technology scaling challenges and opportunities of memory devices. In 2016 IEEE International Electron Devices Meeting (IEDM), 2016.
[50]
Y. Lee, Y. Kim, and H. Y. Yeom. Lightweight memory tracing for hot data identification. Cluster Computing, 2020.
[51]
Y. Lee, H. A. Maruf, M. Chowdhury, A. Cidon, and K. G. Shin. Hydra : Resilient and highly available remote memory. In FAST, 2022.
[52]
H. Li, D. S. Berger, S. Novakovic, L. Hsu, D. Ernst, P. Zardoshti, M. Shah, S. Rajadnya, S. Lee, I. Agarwal, M. D. Hill, M. Fontoura, and R. Bianchini. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. In ASPLOS, 2023.
[53]
Y. Li, S. Ghose, J. Choi, J. Sun, H. Wang, and O. Mutlu. Utility-based hybrid memory management. In CLUSTER, 2017.
[54]
C. A. Mack. Fifty years of moore’s law. IEEE Transactions on Semiconductor Manufacturing, 2011.
[55]
H. A. Maruf and M. Chowdhury. Effectively Prefetching Remote Memory with Leap. In USENIX ATC, 2020.
[56]
H. A. Maruf, Y. Zhong, H. Wong, M. Chowdhury, A. Cidon, and C. Waldspurger. Memtrade: A disaggregated-memory marketplace for public clouds. arXiv preprint arXiv:2108.06893, 2021.
[57]
M. R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G. H. Loh. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In HPCA, 2015.
[58]
V. Mishra, J. L. Benjamin, and G. Zervas. MONet: heterogeneous memory over optical network for large-scale data center resource disaggregation. Journal of Optical Communications and Networking, 2021.
[59]
J. C. Mogul, E. Argollo, M. Shah, and P. Faraboschi. Operating system support for nvm+dram hybrid main memory. In HotOS, 2009.
[60]
M. Oskin and G. H. Loh. A software-managed approach to die-stacked dram. In PACT, 2015.
[61]
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. SIGOPS Oper. Syst. Rev., 2010.
[62]
L. E. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems. In ICS, 2011.
[63]
A. Raybuck, T. Stamler, W. Zhang, M. Erez, and S. Peter. HeMem: Scalable tiered memory management for big data applications and real nvm. In SOSP, 2021.
[64]
Z. Ruan, M. Schwarzkopf, M. K. Aguilera, and A. Belay. AIFM: High-performance, application-integrated far memory. In OSDI, 2020.
[65]
H. Servat, A. J. Peña, G. Llort, E. Mercadal, H.-C. Hoppe, and J. Labarta. Automating the application data placement in hybrid memory systems. In IEEE International Conference on Cluster Computing (CLUSTER), 2017.
[66]
Y. Shan, Y. Huang, Y. Chen, and Y. Zhang. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In OSDI, 2018.
[67]
A. Sriraman and A. Dhanotia. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. 2020.
[68]
A. Sriraman, A. Dhanotia, and T. F. Wenisch. SoftSKU: Optimizing server architectures for microservice diversity @scale. In ISCA, 2019.
[69]
Vladimir Davydov. Idle Memory Tracking. https://lwn.net/Articles/639341/.
[70]
M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes. Building an elastic query engine on disaggregated storage. In NSDI, 2020.
[71]
C. Wang, H. Ma, S. Liu, Y. Li, Z. Ruan, K. Nguyen, M. D. Bond, R. Netravali, M. Kim, and G. H. Xu. Semeru: A Memory-Disaggregated managed runtime. In OSDI, 2020.
[72]
W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen. Exploiting program semantics to place data in hybrid memory. In PACT, 2015.
[73]
J. Weiner, N. Agarwal, D. Schatzberg, L. Yang, H. Wang, B. Sanouillet, B. Sharma, T. Heo, M. Jain, C. Tang, and D. Skarlatos. TMO: Transparent memory offloading in datacenters. In ASPLOS, 2022.
[74]
K. Wu, Y. Huang, and D. Li. Unimem: Runtime data managementon non-volatile memory-based heterogeneous main memory. In SC, 2017.
[75]
K. Wu, J. Ren, and D. Li. Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs. In SC, 2018.
[76]
Z. Yan, D. Lustig, D. Nellans, and A. Bhattacharjee. Nimble page management for tiered memory systems. In ASPLOS, 2019.
[77]
L. Zhang, R. Karimi, I. Ahmad, and Y. Vigfusson. Optimal data placement for heterogeneous cache, memory, and storage systems. In SIGMETRICS, 2020.
[78]
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
March 2023
820 pages
ISBN:9781450399180
DOI:10.1145/3582016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CXL-Memory
  2. Datacenters
  3. Heterogeneous System
  4. Memory Management
  5. Operating Systems
  6. Tiered-Memory

Qualifiers

  • Research-article

Conference

ASPLOS '23

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,680
  • Downloads (Last 6 weeks)284
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media