skip to main content
10.1145/3625549.3658659acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open access

IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement Learning

Published: 30 August 2024 Publication History

Abstract

To address the limitation of a DRAM-based single-tier in satisfying the comprehensive demands of main memory, multi-tiered memory systems are gaining widespread adoption. To support these systems, operating-system-level solutions that analyze the application's memory access patterns and ensure data placement in the appropriate memory tier have been vastly explored.
In this paper, we identify reinforcement learning (RL) as an effective solution for tiered memory management, and its policy can be formulated in a solvable form using RL. We also demonstrate that an effective region-granularity memory access monitoring method is necessary to provide an accurate environment state to the RL model. Thus, we propose IDT, an intelligent data placement for multi-tiered main memory. IDT incorporates an RL-based demotion policy autotuning and a mechanism that efficiently demotes cold pages to lower-tier memory. IDT also promotes hot pages to upper-tier memory to minimize access on slow memory, featuring a lightweight machine learning algorithm. IDT employs region-granularity memory access monitoring with statistical-testing-based adjacent region merge and split to improve precision and mitigate ambiguity observed in prior works. Experiments on an actual four-tiered memory system show that IDT achieves an average 2.08× speedup over the default Linux kernel and 11.2% performance improvement compared to the state-of-the-art solution.

References

[1]
2023. Intel® 64 and ia-32 architectures software developer's manual. Volume 3B: System Programming Guide, Part (2023).
[2]
Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-Transparent Page Management for Two-tiered Main Memory. In ASPLOS (Xi'an, China). Association for Computing Machinery, New York, NY, USA, 631--644.
[3]
Ibrahim Umit Akgun, Ali Selman Aydin, Aadil Shaikh, Lukas Velikov, and Erez Zadok. 2021. A Machine Learning Framework to Improve Storage System Performance. In Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems (Virtual, USA) (HotStorage '21). Association for Computing Machinery, New York, NY, USA, 94--102.
[4]
James Alfred Ang, Brian W Barrett, Kyle Bruce Wheeler, and Richard C Murphy. 2010. Introducing the Graph 500. (2010).
[5]
Scott Beamer, Krste Asanović, and David Patterson. 2017. The GAP Benchmark Suite. arXiv:1508.03619 [cs.DC]
[6]
Rahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. 2021. Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning. In MICRO (Virtual Event, Greece). Association for Computing Machinery, New York, NY, USA, 1121--1137.
[7]
Shai Bergman, Priyank Faldu, Boris Grot, Lluís Vilanova, and Mark Silberstein. 2022. Reconsidering OS Memory Optimizations in the Presence of Disaggregated Memory. In ISMM (San Diego, CA, USA). Association for Computing Machinery, New York, NY, USA, 1--14.
[8]
James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (Berlin, Germany) (ICPE '18). Association for Computing Machinery, New York, NY, USA, 41--42.
[9]
CXL 3.0 2022. Compute Express Link. https://www.computeexpresslink.org.
[10]
Nicolas Denoyelle, Swann Perarnau, Kamil Iskra, and Balazs Gerofi. 2022. Rapid Execution Time Estimation for Heterogeneous Memory Systems Through Differential Tracing. In High Performance Computing: 37th International Conference, ISC High Performance 2022, Hamburg, Germany, May 29 -- June 2, 2022, Proceedings (Hamburg, Germany). Springer-Verlag, Berlin, Heidelberg, 256--274.
[11]
Tonmoy Dey, Kento Sato, Bogdan Nicolae, Jian Guo, Jens Domke, Weikuan Yu, Franck Cappello, and Kathryn Mohror. 2020. Optimizing Asynchronous MultiLevel Checkpoint/Restart Configurations with Machine Learning. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1036--1043.
[12]
Thaleia Dimitra Doudali, Sergey Blagodurov, Abhinav Vishnu, Sudhanva Gurumurthi, and Ada Gavrilovska. 2019. Kleio: A Hybrid Memory Page Scheduler with Machine Intelligence. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (Phoenix, AZ, USA) (HPDC '19). Association for Computing Machinery, New York, NY, USA, 37--48.
[13]
Thaleia Dimitra Doudali and Ada Gavrilovska. 2022. Coeus: Clustering (A)like Patterns for Practical Machine Intelligent Hybrid Memory Management. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (Taormina, Italy) (CCGrid '22). 615--624.
[14]
Thaleia Dimitra Doudali, Daniel Zahka, and Ada Gavrilovska. 2021. Cori: Dancing to the Right Beat of Periodic Data Movements over Hybrid Memory Systems. In 2021 IEEE International Parallel and Distributed Processing Symposium (Portland, OR, USA) (IPDPS '21). 350--359.
[15]
Padmapriya Duraisamy, Wei Xu, Scott Hare, Ravi Rajwar, David Culler, Zhiyi Xu, Jianing Fan, Christopher Kennelly, Bill McCloskey, Danijela Mijailovic, Brian Morris, Chiranjit Mukherjee, Jingliang Ren, Greg Thelen, Paul Turner, Carlos Villavieja, Parthasarathy Ranganathan, and Amin Vahdat. 2023. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale. In ASPLOS (Vancouver, BC, Canada). Association for Computing Machinery, New York, NY, USA, 727--741.
[16]
Mel Gorman. 2012. Foundation for Automatic NUMA Balancing. https://lwn.net/Articles/523065/.
[17]
Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, and Philippe Preux. 2021. READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). 70--81.
[18]
GUPS 2021. GUPS (Giga Updates Per Second). https://icl.utk.edu/projectsfiles/hpcc/RandomAccess/.
[19]
Taekyung Heo, Yang Wang, Wei Cui, Jaehyuk Huh, and Lintao Zhang. 2022. Adaptive Page Migration Policy With Huge Pages in Tiered Memory Systems. IEEE Trans. Comput. 71, 1 (2022), 53--68.
[20]
Amazon Inc. [n. d.]. Amazon EC2 High Memory Instances. https://aws.amazon.com/ec2/instance-types/high-memory/.
[21]
Intel. 2022. Tiering-0.8. https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/.
[22]
M. Jorda, S. Rai, E. Ayguade, J. Labarta, and A. J. Pena. 2022. ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC. In 2022 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, Los Alamitos, CA, USA, 278--288.
[23]
Jonghyeon Kim, Wonkyo Choe, and Jeongseob Ahn. 2021. Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. In USENIC ATC (Virtual Event). USENIX Association, 715--728. https://www.usenix.org/conference/atc21/presentation/kim-jonghyeon
[24]
Seyeon Kim, Kyungmin Bin, Sangtae Ha, Kyunghan Lee, and Song Chong. 2021. zTT: Learning-Based DVFS with Zero Thermal Throttling for Mobile Devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services (Virtual Event, Wisconsin) (MobiSys '21). Association for Computing Machinery, New York, NY, USA, 41--53.
[25]
Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018. Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC '18). Association for Computing Machinery, New York, NY, USA, 219--230.
[26]
Oliver Kramer. 2013. K-Nearest Neighbors. Springer Berlin Heidelberg, Berlin, Heidelberg, 13--23.
[27]
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. 2019. Software-Defined Far Memory in Warehouse-Scale Computers. In ASPLOS (Providence, RI, USA). Association for Computing Machinery, New York, NY, USA, 317--330.
[28]
Taehyung Lee, Sumit Kumar Monga, Changwoo Min, and Young Ik Eom. 2023. MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size Determination. In SOSP (Koblenz, Germany). Association for Computing Machinery, New York, NY, USA, 17--34.
[29]
Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for Distributed Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 3053--3062. https://proceedings.mlr.press/v80/liang18b.html
[30]
Linux community. 2023. migrate_pages() Function of Linux Kernel. https://elixir.bootlin.com/linux/v6.0.19/source/mm/migrate.c#L1395.
[31]
Evan Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, and Junwhan Ahn. 2020. An Imitation Learning Approach for Cache Replacement. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 6237--6247. https://proceedings.mlr.press/v119/liu20f.html
[32]
Kaiyang Liu, Jun Peng, Jingrong Wang, Boyang Yu, Zhuofan Liao, Zhiwu Huang, and Jianping Pan. 2022. A Learning-Based Data Placement Framework for Low Latency in Data Center Networks. IEEE Transactions on Cloud Computing 10, 1 (2022), 146--157.
[33]
Google LLC. [n. d.]. Memory-optimized machine family for Compute Engine. https://cloud.google.com/compute/docs/memory-optimized-machines/.
[34]
Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning-Based Memory Allocation for C++ Server Workloads. In ASPLOS (Lausanne, Switzerland). Association for Computing Machinery, New York, NY, USA, 541--556.
[35]
A. Maruf, A. Ghosh, J. Bhimani, D. Campello, A. Rudoff, and R. Rangaswami. 2022. MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Los Alamitos, CA, USA, 925--937.
[36]
Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. 2023. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory. In ASPLOS (Vancouver, BC, Canada). Association for Computing Machinery, New York, NY, USA, 742--755.
[37]
H. Menon, A. Bhatele, and T. Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society, Los Alamitos, CA, USA, 831--840.
[38]
Yaebin Moon, Wanju Doh, Kwanhee Kyung, Eojin Lee, and Jung Ho Ahn. 2023. ADT: Aggressive Demotion and Promotion for Tiered Memory. IEEE Computer Architecture Letters 22, 1 (2023), 21--24.
[39]
Onur Mutlu and Lavanya Subramanian. 2014. Research Problems and Opportunities in Memory Systems. Supercomputing Frontiers and Innovations: An International Journal 1, 3 (oct 2014), 19--55.
[40]
Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, and Sreenivas Subramoney. 2023. Telescope: Telemetry at Terabyte Scale. arXiv:2311.10275 [cs.OS]
[41]
Nevine Nassif, Ashley O. Munch, Carleton L. Molnar, Gerald Pasdast, Sitaraman V. Lyer, Zibing Yang, Oscar Mendoza, Mark Huddart, Srikrishnan Venkataraman, Sireesha Kandula, Rafi Marom, Alexandra M. Kern, Bill Bowhill, David R. Mulvihill, Srikanth Nimmagadda, Varma Kalidindi, Jonathan Krause, Mohammad M. Haq, Roopali Sharma, and Kevin Duda. 2022. Sapphire Rapids: The Next-Generation Intel Xeon Scalable Processor. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65. 44--46.
[42]
Aleix Roca Nonell, Balazs Gerofi, Leonardo Bautista-Gomez, Dominique Martinet, Vicenç Beltran Querol, and Yutaka Ishikawa. 2018. On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale. In Proceedings of the Workshop on Memory Centric High Performance Computing (Dallas, TX, USA) (MCHPC'18). Association for Computing Machinery, New York, NY, USA, 50--57.
[43]
Lu Pang, Anis Alazzawe, Madhurima Ray, Krishna Kant, and Jeremy Swift. 2023. Adaptive Intelligent Tiering for modern storage systems. Performance Evaluation 160 (2023), 102332.
[44]
SeongJae Park. 2020. DAMON: Data Access Monitor. https://docs.kernel.org/mm/damon/index.html.
[45]
SeongJae Park, Madhuparna Bhowmik, and Alexandru Uta. 2022. DAOS: Data Access-Aware Operating System. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (Minneapolis, MN, USA) (HPDC '22). Association for Computing Machinery, New York, NY, USA, 4--15.
[46]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
[47]
Amanda Raybuck, Tim Stamler, Wei Zhang, Mattan Erez, and Simon Peter. 2021. HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM. In SOSP (Virtual Event, Germany). Association for Computing Machinery, New York, NY, USA, 392--407.
[48]
Jie Ren, Dong Xu, Ivy Peng, Junhee Ryu, Kwangsik Shin, Daewoo Kim, and Dong Li. 2023. Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems. arXiv:2302.09468
[49]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347
[50]
Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, and Onur Mutlu. 2022. Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning. In ISCA (New York, New York). Association for Computing Machinery, New York, NY, USA, 320--336.
[51]
Kevin Song, Jiacheng Yang, Sihang Liu, and Gennady Pekhimenko. 2023. Lightweight Frequency-Based Tiering for CXL Memory Systems. arXiv:2312.04789 [cs.DC]
[52]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.
[53]
Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, and Shie Mannor. 2022. Reinforcement Learning for Datacenter Congestion Control. SIGMETRICS Performance Evaluation Review 49, 2 (jan 2022), 43--46.
[54]
Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dimitrios Skarlatos. 2022. TMO: transparent memory offloading in datacenters. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASP-LOS '22). Association for Computing Machinery, New York, NY, USA, 609--621.
[55]
Lingfeng Xiang, Zhen Lin, Weishu Deng, Hui Lu, Jia Rao, Yifan Yuan, and Ren Wang. 2024. MATRYOSHKA: Non-Exclusive Memory Tiering via Transactional Page Migration. arXiv preprint arXiv:2401.13154 (2024).
[56]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Nimble Page Management for Tiered Memory Systems. In ASPLOS (Providence, RI, USA). Association for Computing Machinery, New York, NY, USA, 331--345.
[57]
Huang Ying. 2020. AutoNUMA: Optimize Memory Placement for Memory Tiering System. https://lwn.net/Articles/835402/.
[58]
Yu Zhao. 2022. Multigenerational LRU Framework. https://lwn.net/Articles/880393/.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
June 2024
436 pages
ISBN:9798400704130
DOI:10.1145/3625549
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2024

Check for updates

Author Tags

  1. memory tiering
  2. emerging memory technologies
  3. memory management
  4. reinforcement learning

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 126
    Total Downloads
  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)126
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media