skip to main content
10.1145/3613424.3623784acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in Memory

Published: 08 December 2023 Publication History

Abstract

Fusing computation and memory through Processing-in-Memory (PIM) provides a radical solution to the memory wall problem by minimizing communication overheads for data-intensive tasks, leading to a revolutionary shift in computer architecture. Although PIM has demonstrated promising results at different layers of the memory hierarchy, few studies have explored integrating compute memories into the memory management system, specifically in relation to coherence protocol. This paper presents MVC, a framework that leverages existing coherence protocols to enable fully coherent views throughout the memory hierarchy. By introducing coherent views, which are user-defined compact representations of conventional data structures, MVC can minimize data movement and harness the reusability of PIM output. The locality-aware MVC views significantly enhance the performance and energy efficiency of various irregular workloads.

References

[1]
S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. 2017. Compute Caches. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 481–492. https://doi.org/10.1109/HPCA.2017.21
[2]
Anant Agarwal, Richard Simoni, John Hennessy, and Mark Horowitz. 1988. An evaluation of directory schemes for cache coherence. ACM SIGARCH Computer Architecture News 16, 2 (1988), 280–298.
[3]
Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture(ISCA ’15).
[4]
Berkin Akin, Franz Franchetti, and James C Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. ACM SIGARCH Computer Architecture News 43, 3S (2015), 131–143.
[5]
Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Kevin Hsieh, Krishna T. Malladi, Hongzhong Zheng, and Onur Mutlu. 2017. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory. IEEE Computer Architecture Letters 16, 1 (2017), 46–50. https://doi.org/10.1109/LCA.2016.2577557
[6]
Jay B. Brockman, Shyamkumar Thoziyoor, Shannon K. Kuntz, and Peter M. Kogge. 2004. A Low Cost, Multithreaded Processing-in-memory System. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture (Munich, Germany) (WMPI ’04). ACM, New York, NY, USA, 16–22. https://doi.org/10.1145/1054943.1054946
[7]
John Carter, Wilson Hsieh, Leigh Stoller, Mark Swanson, Lixin Zhang, Erik Brunvand, Al Davis, Chen-Chi Kuo, Ravindra Kuramkote, Michael Parker, 1999. Impulse: Building a smarter memory controller. In Proceedings Fifth International Symposium on High-Performance Computer Architecture. IEEE, 70–79.
[8]
Daniel Rodrigues Carvalho and André Seznec. 2021. Understanding cache compression. ACM Transactions on Architecture and Code Optimization (TACO) 18, 3 (2021), 1–27.
[9]
Apache Cassandra. 2014. Apache cassandra. Website. Available online at http://planetcassandra. org/what-is-apache-cassandra 13 (2014).
[10]
David Chaiken, Craig Fields, Kiyoshi Kurihara, and Anant Agarwal. 1990. Directory-based cache coherence in large-scale multiprocessors. Computer 23, 6 (1990), 49–58.
[11]
David Chaiken, John Kubiatowicz, and Anant Agarwal. 1991. LimitLESS directories: A scalable cache coherence scheme. ACM SIGARCH Computer Architecture News 19, 2 (1991), 224–234.
[12]
Ping Chi, Shuangchen Li, and Cong Xu. 2016. PRIME : A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory. In IEEE International Symposium on Computer Architecture. IEEE, 27–39. https://doi.org/10.1109/ISCA.2016.13
[13]
Henry Cook, Krste Asanovic, and David A Patterson. 2009. Virtual local stores: Enabling software-managed memory hierarchies in mainstream computing environments. Technical Report. Technical Report No. UCB/EECS-2009-131.
[14]
David L Dill, Andreas J Drexler, Alan J Hu, and C Han Yang. 1992. Protocol verification as a hardware design aid. In ICCD, Vol. 92. Citeseer, 522–525.
[15]
Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios Pnevmatikatos. 2017. The mondrian data engine. ACM SIGARCH Computer Architecture News 45, 2 (2017), 639–651.
[16]
C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaaauw, and R. Das. 2018. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 383–396. https://doi.org/10.1109/ISCA.2018.00040
[17]
A. Farmahini-Farahani, Jung Ho Ahn, K. Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on.
[18]
Basilio B. Fraguela, Jose Renau, Paul Feautrier, David Padua, and Josep Torrellas. 2003. Programming the FlexRAM Parallel Intelligent Memory System. SIGPLAN Not. 38, 10 (June 2003), 49–60. https://doi.org/10.1145/966049.781505
[19]
Daichi Fujiki, Niladrish Chatterjee, Donghyuk Lee, and Mike O’Connor. 2019. Near-memory data transformation for efficient sparse matrix multi-vector multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–17.
[20]
Daichi Fujiki, Alireza Khadem, Scott Mahlke, and Reetuparna Das. 2022. Multi-Layer In-Memory Processing. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). 920–936. https://doi.org/10.1109/MICRO56248.2022.00068
[21]
Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-Memory Data Parallel Processor. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (Williamsburg, VA, USA) (ASPLOS ’18). ACM, New York, NY, USA, 1–14. https://doi.org/10.1145/3173162.3173171
[22]
Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2019. Duality cache for data parallel acceleration. In Proceedings of the 46th International Symposium on Computer Architecture. 397–410.
[23]
Daichi Fujiki, Xiaowei Wang, Arun Subramaniyan, and Reetuparna Das. 2021. In-/near-memory Computing. Synthesis Lectures on Computer Architecture 16, 2 (2021), 1–140.
[24]
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, 113–124.
[25]
Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and flexible reconfigurable logic for near-data processing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). Ieee, 126–137.
[26]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. 751–764.
[27]
Sanjay Ghemawat and Jeff Dean. 2011. LevelDB. https://github.com/google/leveldb
[28]
Christina Giannoula, Ivan Fernandez, Juan Gómez Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. 2022. SparseP: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures. Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, 1 (2022), 1–49.
[29]
Byungchul Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, and John Kim. 2016. Accelerating linked-list traversal through near-data processing. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. 113–124.
[30]
Kevin Hsieh, Eiman Ebrahim, Gwangsun Kim, Niladrish Chatterjee, Mike O’Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016. https://doi.org/10.1109/ISCA.2016.27
[31]
WiredTiger Inc.2012. WiredTiger Storage Engine. https://www.mongodb.com/docs/manual/core/wiredtiger/.
[32]
Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, 2020. Recnmp: Accelerating personalized recommendation with near-memory processing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 790–803.
[33]
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. In Proceedings of ISCA, Vol. 43.
[34]
Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have Your Scratchpad and Cache It Too. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA ’15). Association for Computing Machinery, New York, NY, USA, 707–719. https://doi.org/10.1145/2749469.2750374
[35]
Gunjae Koo, Kiran Kumar Matam, Te I, HV Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 219–231.
[36]
Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 740–753.
[37]
Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Shashwat Gupta, Daniel Sanchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 417–433. https://doi.org/10.1145/3373376.3378497
[38]
Andrew McCrabb, Hellina Nigatu, Absalat Getachew, and Valeria Bertacco. 2022. DyGraph: A Dynamic Graph Generator and Benchmark Suite. In Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (Philadelphia, Pennsylvania) (GRADES-NDA ’22). Association for Computing Machinery, New York, NY, USA, Article 7, 8 pages. https://doi.org/10.1145/3534540.3534692
[39]
Igor Melatti, Robert Palmer, Geoffrey Sawaya, Yu Yang, Robert M Kirby, and Ganesh Gopalakrishnan. 2009. Parallel and distributed model checking in eddy. International Journal on Software Tools for Technology Transfer 11 (2009), 13–25.
[40]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP laboratories (2009), 22–31.
[41]
Mark Oskin, Frederic T Chong, Timothy Sherwood, Mark Oskin, Frederic T Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. ACM SIGARCH Computer Architecture News 26, 3 (1998), 192–203. https://doi.org/10.1145/279358.279387
[42]
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. A case for intelligent RAM. Micro, IEEE (1997).
[43]
S.H. Pugsley, J. Jestes, Huihui Zhang, R. Balasubramonian, V. Srinivasan, A. Buyuktosunoglu, A. Davis, and Feifei Li. 2014. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on.
[44]
Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 187–198.
[45]
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. ACM SIGARCH Computer architecture news 41, 3 (2013), 475–486.
[46]
S Sanfilippo. 2009. Redis In-memory Data Structure Server. https://redis.io/.
[47]
Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. [n. d.]. RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture(MICRO-46).
[48]
Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A Kozuch, Onur Mutlu, Phillip B Gibbons, and Todd C Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 273–287.
[49]
Vivek Seshadri, Thomas Mullins, Amirali Boroumand, Onur Mutlu, Phillip B Gibbons, Michael A. Kozuch, and Todd C Mowry. 2015. Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses. In 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 267–280. https://doi.org/10.1145/2830772.2830820
[50]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, and Rajeev Balasubramonian. 2016. ISAAC : A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (jun 2016), 14–26. https://doi.org/10.1109/ISCA.2016.12
[51]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In Proceedings - International Symposium on High-Performance Computer Architecture. 541–552. https://doi.org/10.1109/HPCA.2017.55
[52]
Facebook RocksDB team. 2012. A Persistent Key-Value Store for Fast Storage Environments. https://rocksdb.org/
[53]
Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-defined cache hierarchies. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 652–665.
[54]
Po-An Tsai, Changping Chen, and Daniel Sanchez. 2018. Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 641–654. https://doi.org/10.1109/MICRO.2018.00058
[55]
Tobias Vinçon, Christian Knödler, Leonardo Solis-Vasquez, Arthur Bernhardt, Sajjad Tamimi, Lukas Weber, Florian Stock, Andreas Koch, and Ilia Petrov. 2022. Near-data processing in database systems on native computational storage under htap workloads. Proceedings of the VLDB Endowment 15, 10 (2022), 1991–2004.
[56]
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, 2019. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019).
[57]
Peng Wang, Shuo Li, Guangyu Sun, Xiaoyang Wang, Yiran Chen, Hai Li, Jason Cong, Nong Xiao, and Tao Zhang. 2018. RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-memory Databases. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 518–530. https://doi.org/10.1109/HPCA.2018.00051
[58]
Xin Xin, Yanan Guo, Youtao Zhang, and Jun Yang. 2021. SAM: accelerating strided memory accesses. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 324–336.
[59]
Seehwan Yoo, Hojin Shin, Sunghyun Lee, and Jongmoo Choi. 2022. A Read Performance Analysis with Storage Hierarchy in Modern KVS: A RocksDB Case. In 2022 IEEE 11th Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 45–50.
[60]
Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L Greathouse, Lifan Xu, and Michael Ignatowski. 2014. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. 85–98.
[61]
Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented Programmable Processing in Memory. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing(HPDC ’14).
[62]
Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in Neural Information Processing Systems 31 (2018), 5165–5175.
[63]
Muhan Zhang, Pan Li, Yinglong Xia, Kai Wang, and Long Jin. 2020. Revisiting graph neural networks for link prediction. arXiv preprint arXiv:2010.16103 (2020).
[64]
Qiuling Zhu, B. Akin, H.E. Sumbul, F. Sadi, J.C. Hoe, L. Pileggi, and F. Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In 3D Systems Integration Conference (3DIC), 2013 IEEE International.

Index Terms

  1. MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in Memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
      October 2023
      1528 pages
      ISBN:9798400703294
      DOI:10.1145/3613424
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 December 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cache Coherence Protocol
      2. Caches
      3. Processing-in-Memory

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      MICRO '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 625
        Total Downloads
      • Downloads (Last 12 months)500
      • Downloads (Last 6 weeks)45
      Reflects downloads up to 21 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media