research-article

Open access

IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement Learning

Authors:

Jung Ho AhnAuthors Info & Claims

HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing

Pages 69 - 82

https://doi.org/10.1145/3625549.3658659

Published: 30 August 2024 Publication History

Abstract

To address the limitation of a DRAM-based single-tier in satisfying the comprehensive demands of main memory, multi-tiered memory systems are gaining widespread adoption. To support these systems, operating-system-level solutions that analyze the application's memory access patterns and ensure data placement in the appropriate memory tier have been vastly explored.

In this paper, we identify reinforcement learning (RL) as an effective solution for tiered memory management, and its policy can be formulated in a solvable form using RL. We also demonstrate that an effective region-granularity memory access monitoring method is necessary to provide an accurate environment state to the RL model. Thus, we propose IDT, an intelligent data placement for multi-tiered main memory. IDT incorporates an RL-based demotion policy autotuning and a mechanism that efficiently demotes cold pages to lower-tier memory. IDT also promotes hot pages to upper-tier memory to minimize access on slow memory, featuring a lightweight machine learning algorithm. IDT employs region-granularity memory access monitoring with statistical-testing-based adjacent region merge and split to improve precision and mitigate ambiguity observed in prior works. Experiments on an actual four-tiered memory system show that IDT achieves an average 2.08× speedup over the default Linux kernel and 11.2% performance improvement compared to the state-of-the-art solution.

References

[1]

2023. Intel® 64 and ia-32 architectures software developer's manual. Volume 3B: System Programming Guide, Part (2023).

[2]

Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-Transparent Page Management for Two-tiered Main Memory. In ASPLOS (Xi'an, China). Association for Computing Machinery, New York, NY, USA, 631--644.

Digital Library

[3]

Ibrahim Umit Akgun, Ali Selman Aydin, Aadil Shaikh, Lukas Velikov, and Erez Zadok. 2021. A Machine Learning Framework to Improve Storage System Performance. In Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems (Virtual, USA) (HotStorage '21). Association for Computing Machinery, New York, NY, USA, 94--102.

Digital Library

[4]

James Alfred Ang, Brian W Barrett, Kyle Bruce Wheeler, and Richard C Murphy. 2010. Introducing the Graph 500. (2010).

[5]

Scott Beamer, Krste Asanović, and David Patterson. 2017. The GAP Benchmark Suite. arXiv:1508.03619 [cs.DC]

[6]

Rahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. 2021. Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning. In MICRO (Virtual Event, Greece). Association for Computing Machinery, New York, NY, USA, 1121--1137.

Digital Library

[7]

Shai Bergman, Priyank Faldu, Boris Grot, Lluís Vilanova, and Mark Silberstein. 2022. Reconsidering OS Memory Optimizations in the Presence of Disaggregated Memory. In ISMM (San Diego, CA, USA). Association for Computing Machinery, New York, NY, USA, 1--14.

Digital Library

[8]

James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (Berlin, Germany) (ICPE '18). Association for Computing Machinery, New York, NY, USA, 41--42.

Digital Library

[9]

CXL 3.0 2022. Compute Express Link. https://www.computeexpresslink.org.

[10]

Nicolas Denoyelle, Swann Perarnau, Kamil Iskra, and Balazs Gerofi. 2022. Rapid Execution Time Estimation for Heterogeneous Memory Systems Through Differential Tracing. In High Performance Computing: 37th International Conference, ISC High Performance 2022, Hamburg, Germany, May 29 -- June 2, 2022, Proceedings (Hamburg, Germany). Springer-Verlag, Berlin, Heidelberg, 256--274.

Digital Library

[11]

Tonmoy Dey, Kento Sato, Bogdan Nicolae, Jian Guo, Jens Domke, Weikuan Yu, Franck Cappello, and Kathryn Mohror. 2020. Optimizing Asynchronous MultiLevel Checkpoint/Restart Configurations with Machine Learning. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1036--1043.

[12]

Thaleia Dimitra Doudali, Sergey Blagodurov, Abhinav Vishnu, Sudhanva Gurumurthi, and Ada Gavrilovska. 2019. Kleio: A Hybrid Memory Page Scheduler with Machine Intelligence. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing (Phoenix, AZ, USA) (HPDC '19). Association for Computing Machinery, New York, NY, USA, 37--48.

Digital Library

[13]

Thaleia Dimitra Doudali and Ada Gavrilovska. 2022. Coeus: Clustering (A)like Patterns for Practical Machine Intelligent Hybrid Memory Management. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (Taormina, Italy) (CCGrid '22). 615--624.

[14]

Thaleia Dimitra Doudali, Daniel Zahka, and Ada Gavrilovska. 2021. Cori: Dancing to the Right Beat of Periodic Data Movements over Hybrid Memory Systems. In 2021 IEEE International Parallel and Distributed Processing Symposium (Portland, OR, USA) (IPDPS '21). 350--359.

[15]

Padmapriya Duraisamy, Wei Xu, Scott Hare, Ravi Rajwar, David Culler, Zhiyi Xu, Jianing Fan, Christopher Kennelly, Bill McCloskey, Danijela Mijailovic, Brian Morris, Chiranjit Mukherjee, Jingliang Ren, Greg Thelen, Paul Turner, Carlos Villavieja, Parthasarathy Ranganathan, and Amin Vahdat. 2023. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale. In ASPLOS (Vancouver, BC, Canada). Association for Computing Machinery, New York, NY, USA, 727--741.

Digital Library

[16]

Mel Gorman. 2012. Foundation for Automatic NUMA Balancing. https://lwn.net/Articles/523065/.

[17]

Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, and Philippe Preux. 2021. READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). 70--81.

[18]

GUPS 2021. GUPS (Giga Updates Per Second). https://icl.utk.edu/projectsfiles/hpcc/RandomAccess/.

[19]

Taekyung Heo, Yang Wang, Wei Cui, Jaehyuk Huh, and Lintao Zhang. 2022. Adaptive Page Migration Policy With Huge Pages in Tiered Memory Systems. IEEE Trans. Comput. 71, 1 (2022), 53--68.

[20]

Amazon Inc. [n. d.]. Amazon EC2 High Memory Instances. https://aws.amazon.com/ec2/instance-types/high-memory/.

[21]

Intel. 2022. Tiering-0.8. https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/.

[22]

M. Jorda, S. Rai, E. Ayguade, J. Labarta, and A. J. Pena. 2022. ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC. In 2022 IEEE International Conference on Cluster Computing (CLUSTER). IEEE Computer Society, Los Alamitos, CA, USA, 278--288.

[23]

Jonghyeon Kim, Wonkyo Choe, and Jeongseob Ahn. 2021. Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. In USENIC ATC (Virtual Event). USENIX Association, 715--728. https://www.usenix.org/conference/atc21/presentation/kim-jonghyeon

[24]

Seyeon Kim, Kyungmin Bin, Sangtae Ha, Kyunghan Lee, and Song Chong. 2021. zTT: Learning-Based DVFS with Zero Thermal Throttling for Mobile Devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services (Virtual Event, Wisconsin) (MobiSys '21). Association for Computing Machinery, New York, NY, USA, 41--53.

Digital Library

[25]

Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun. 2018. Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC '18). Association for Computing Machinery, New York, NY, USA, 219--230.

Digital Library

[26]

Oliver Kramer. 2013. K-Nearest Neighbors. Springer Berlin Heidelberg, Berlin, Heidelberg, 13--23.

[27]

Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. 2019. Software-Defined Far Memory in Warehouse-Scale Computers. In ASPLOS (Providence, RI, USA). Association for Computing Machinery, New York, NY, USA, 317--330.

Digital Library

[28]

Taehyung Lee, Sumit Kumar Monga, Changwoo Min, and Young Ik Eom. 2023. MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size Determination. In SOSP (Koblenz, Germany). Association for Computing Machinery, New York, NY, USA, 17--34.

Digital Library

[29]

Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for Distributed Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 3053--3062. https://proceedings.mlr.press/v80/liang18b.html

[30]

Linux community. 2023. migrate_pages() Function of Linux Kernel. https://elixir.bootlin.com/linux/v6.0.19/source/mm/migrate.c#L1395.

[31]

Evan Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, and Junwhan Ahn. 2020. An Imitation Learning Approach for Cache Replacement. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 6237--6247. https://proceedings.mlr.press/v119/liu20f.html

[32]

Kaiyang Liu, Jun Peng, Jingrong Wang, Boyang Yu, Zhuofan Liao, Zhiwu Huang, and Jianping Pan. 2022. A Learning-Based Data Placement Framework for Low Latency in Data Center Networks. IEEE Transactions on Cloud Computing 10, 1 (2022), 146--157.

[33]

Google LLC. [n. d.]. Memory-optimized machine family for Compute Engine. https://cloud.google.com/compute/docs/memory-optimized-machines/.

[34]

Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning-Based Memory Allocation for C++ Server Workloads. In ASPLOS (Lausanne, Switzerland). Association for Computing Machinery, New York, NY, USA, 541--556.

Digital Library

[35]

A. Maruf, A. Ghosh, J. Bhimani, D. Campello, A. Rudoff, and R. Rangaswami. 2022. MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Los Alamitos, CA, USA, 925--937.

[36]

Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. 2023. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory. In ASPLOS (Vancouver, BC, Canada). Association for Computing Machinery, New York, NY, USA, 742--755.

Digital Library

[37]

H. Menon, A. Bhatele, and T. Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society, Los Alamitos, CA, USA, 831--840.

[38]

Yaebin Moon, Wanju Doh, Kwanhee Kyung, Eojin Lee, and Jung Ho Ahn. 2023. ADT: Aggressive Demotion and Promotion for Tiered Memory. IEEE Computer Architecture Letters 22, 1 (2023), 21--24.

Digital Library

[39]

Onur Mutlu and Lavanya Subramanian. 2014. Research Problems and Opportunities in Memory Systems. Supercomputing Frontiers and Innovations: An International Journal 1, 3 (oct 2014), 19--55.

Digital Library

[40]

Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, and Sreenivas Subramoney. 2023. Telescope: Telemetry at Terabyte Scale. arXiv:2311.10275 [cs.OS]

[41]

Nevine Nassif, Ashley O. Munch, Carleton L. Molnar, Gerald Pasdast, Sitaraman V. Lyer, Zibing Yang, Oscar Mendoza, Mark Huddart, Srikrishnan Venkataraman, Sireesha Kandula, Rafi Marom, Alexandra M. Kern, Bill Bowhill, David R. Mulvihill, Srikanth Nimmagadda, Varma Kalidindi, Jonathan Krause, Mohammad M. Haq, Roopali Sharma, and Kevin Duda. 2022. Sapphire Rapids: The Next-Generation Intel Xeon Scalable Processor. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65. 44--46.

[42]

Aleix Roca Nonell, Balazs Gerofi, Leonardo Bautista-Gomez, Dominique Martinet, Vicenç Beltran Querol, and Yutaka Ishikawa. 2018. On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory Management at Scale. In Proceedings of the Workshop on Memory Centric High Performance Computing (Dallas, TX, USA) (MCHPC'18). Association for Computing Machinery, New York, NY, USA, 50--57.

Digital Library

[43]

Lu Pang, Anis Alazzawe, Madhurima Ray, Krishna Kant, and Jeremy Swift. 2023. Adaptive Intelligent Tiering for modern storage systems. Performance Evaluation 160 (2023), 102332.

Digital Library

[44]

SeongJae Park. 2020. DAMON: Data Access Monitor. https://docs.kernel.org/mm/damon/index.html.

[45]

SeongJae Park, Madhuparna Bhowmik, and Alexandru Uta. 2022. DAOS: Data Access-Aware Operating System. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (Minneapolis, MN, USA) (HPDC '22). Association for Computing Machinery, New York, NY, USA, 4--15.

Digital Library

[46]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

[47]

Amanda Raybuck, Tim Stamler, Wei Zhang, Mattan Erez, and Simon Peter. 2021. HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM. In SOSP (Virtual Event, Germany). Association for Computing Machinery, New York, NY, USA, 392--407.

Digital Library

[48]

Jie Ren, Dong Xu, Ivy Peng, Junhee Ryu, Kwangsik Shin, Daewoo Kim, and Dong Li. 2023. Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems. arXiv:2302.09468

[49]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347

[50]

Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, and Onur Mutlu. 2022. Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning. In ISCA (New York, New York). Association for Computing Machinery, New York, NY, USA, 320--336.

Digital Library

[51]

Kevin Song, Jiacheng Yang, Sihang Liu, and Gennady Pekhimenko. 2023. Lightweight Frequency-Based Tiering for CXL Memory Systems. arXiv:2312.04789 [cs.DC]

[52]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.

Digital Library

[53]

Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, and Shie Mannor. 2022. Reinforcement Learning for Datacenter Congestion Control. SIGMETRICS Performance Evaluation Review 49, 2 (jan 2022), 43--46.

Digital Library

[54]

Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dimitrios Skarlatos. 2022. TMO: transparent memory offloading in datacenters. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASP-LOS '22). Association for Computing Machinery, New York, NY, USA, 609--621.

Digital Library

[55]

Lingfeng Xiang, Zhen Lin, Weishu Deng, Hui Lu, Jia Rao, Yifan Yuan, and Ren Wang. 2024. MATRYOSHKA: Non-Exclusive Memory Tiering via Transactional Page Migration. arXiv preprint arXiv:2401.13154 (2024).

[56]

Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Nimble Page Management for Tiered Memory Systems. In ASPLOS (Providence, RI, USA). Association for Computing Machinery, New York, NY, USA, 331--345.

Digital Library

[57]

Huang Ying. 2020. AutoNUMA: Optimize Memory Placement for Memory Tiering System. https://lwn.net/Articles/835402/.

[58]

Yu Zhao. 2022. Multigenerational LRU Framework. https://lwn.net/Articles/880393/.

Index Terms

IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement Learning

Recommendations

MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

Multi-terabyte large memory systems are often characterized by more than two memory tiers with different latency and bandwidth. Multi-tiered large memory systems call for rethinking of memory profiling and migration because of the unique problems unseen ...
Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Fast DRAM increasingly dominates infrastructure spend in large scale computing environments and this trend will likely worsen without an architectural shift. The cost of deployed memory can be reduced by replacing part of the conventional DRAM with lower ...
Tiered Memory: An Iso-Power Memory Architecture to Address the Memory Power Wall

Moore's Law improvement in transistor density is driving a rapid increase in the number of cores per processor. DRAM device capacity and energy efficiency are increasing at a slower pace, so the importance of DRAM power is increasing. This problem ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 2024

436 pages

ISBN:9798400704130

DOI:10.1145/3625549

Chair:
Patrizio Dazzi,
Co-chair:
Gabriele Mencagli,
Program Chair:
David Lowenthal,
Program Co-chair:
Rosa M Badia

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

KIAT
IIlP
IITP
SK Hynix

Conference

HPDC '24

Sponsor:

SIGARCH

HPDC '24: 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa, Italy

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
126
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)126

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents