research-article

The Bitlet Model: A Parameterized Analytical Model to Compare PIM and CPU Systems

Authors:

Orian Leitersdorf,

Kunal Korgaonkar,

Anupam Chattopadhyay,

Shahar KvatinskyAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 18, Issue 2

Article No.: 43, Pages 1 - 29

https://doi.org/10.1145/3465371

Published: 25 March 2022 Publication History

Abstract

Currently, data-intensive applications are gaining popularity. Together with this trend, processing-in-memory (PIM)–based systems are being given more attention and have become more relevant. This article describes an analytical modeling tool called Bitlet that can be used in a parameterized fashion to estimate the performance and power/energy of a PIM-based system and, thereby, assess the affinity of workloads for PIM as opposed to traditional computing. The tool uncovers interesting trade-offs between, mainly, the PIM computation complexity (cycles required to perform a computation through PIM), the amount of memory used for PIM, the system memory bandwidth, and the data transfer size. Despite its simplicity, the model reveals new insights when applied to real-life examples. The model is demonstrated for several synthetic examples and then applied to explore the influence of different parameters on two systems — IMAGING and FloatPIM. Based on the demonstrations, insights about PIM and its combination with a CPU are provided.

References

[1]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15), Portland, OR. ACM, 105–117.

Digital Library

[2]

Miguel Angel Lastras-Montaño and Kwang-Ting Cheng. 2018. Resistive random-access memory based on ratioed memristors. Nature Electronics 1 (Aug. 2018), 466–472.

[3]

Rotem Ben Hur and Shahar Kvatinsky. 2016. Memristive memory processing unit (MPU) controller for in-memory processing. In IEEE International Conference on the Science of Electrical Engineering (ICSEE’16). IEEE, 1–5.

[4]

Rotem Ben-Hur, Ronny Ronen, Ameer Haj-Ali, Debjyoti Bhattacharjee, Adi Eliahu, Natan Peled, and Shahar Kvatinsky. 2020. SIMPLER MAGIC: Synthesis and mapping of in-memory logic executed in a single row to improve throughput. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2020), 2434–2447.

[5]

Rotem Ben Hur, Nimrod Wald, Nishil Talati, and Shahar Kvatinsky. 2017. SIMPLE MAGIC: Synthesis and in-memory mapping of logic execution for memristor-aided logic. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17), Irvine, CA. IEEE, 225–232.

Digital Library

[6]

Debjyoti Bhattacharjee, Rajeswari Devadoss, and Anupam Chattopadhyay. 2017. ReVAMP: ReRAM based VLIW architecture for in-memory computing. In Design, Automation and Test in Europe Conference & Exhibition (DATE’17), Lausanne, Switzerland. IEEE, 782–787.

[7]

Debjyoti Bhattacharjee, Yaswanth Tavva, Arvind Easwaran, and Anupam Chattopadhyay. 2020. Crossbar-constrained technology mapping for ReRAM based in-memory computing. IEEE Trans. Comput. 69, 5 (2020), 734–748.

[8]

Julien Borghetti, Gregory S. Snider, Philip J. Kuekes, J. Joshua Yang, Duncan R. Stewart, and R. Stanley Williams. 2010. ‘Memristive’ switches enable ‘stateful’ logic operations via material implication. Nature 464, 7290 (2010), 873–876.

[9]

Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485–532.

[10]

Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18), Los Angeles, CA. IEEE, 383–396.

Digital Library

[11]

Adi Eliahu, Rotem Ben-Hur, Ronny Ronen, and Shahar Kvatinsky. 2020. abstractPIM: Bridging the gap between processing-in-memory technology and instruction set architecture. In IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC’20), Salt Lake City, UT. IEEE, 28–33.

[12]

Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2012. Dark silicon and the end of multicore scaling. IEEE Micro 32, 3 (2012), 122–134.

Digital Library

[13]

Daichi Fujiki, Scott Mahlke, and Reetuparna Das. 2018. In-memory data parallel processor. ACM SIGPLAN Notices 53, 2 (2018), 1–14.

Digital Library

[14]

John L. Gustafson. 1988. Reevaluating Amdahl’s law. Commun. ACM 31, 5 (May 1988), 532–533.

Digital Library

[15]

Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, and Shahar Kvatinsky. 2018. IMAGING: In-memory algorithms for image processing. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 12 (2018), 4258–4271.

[16]

Mark Hill and Vijay Janapa Reddi. 2019. Gables: A roofline model for mobile SoCs. In IEEE International Symposium on High Performance Computer Architecture (HPCA’19), Washington, DC. IEEE, 317–330.

[17]

Mark D. Hill and Michael R. Marty. 2008. Amdahl’s law in the multicore era. Computer 41, 7 (July 2008), 33–38.

Digital Library

[18]

Barak Hoffer, Vikas Rana, Stephan Menzel, Rainer Waser, and Shahar Kvatinsky. 2020. Experimental demonstration of memristor-aided logic (MAGIC) using valence change memory (VCM). IEEE Transactions on Electron Devices 67, 8 (2020), 3115–3122.

[19]

Rotem Ben Hur and Shahar Kvatinsky. 2016. Memory processing unit for in-memory processing. In IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH’16), Beijing, China. IEEE, 171–172.

[20]

Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. FloatPIM: In-memory acceleration of deep neural network training with high precision. In ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA’19). 802–815.

[21]

Mohsen Imani, Saransh Gupta, and Tajana Rosing. 2017. Ultra-efficient processing in-memory for data intensive applications. In 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17), Phoenix, AZ. IEEE, 1–6.

Digital Library

[22]

Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu. 2018. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies. BMC Genomics 19, S2 (2018), 23–40.

[23]

Kunal Korgaonkar, Ronny Ronen, Anupam Chattopadhyay, and Shahar Kvatinsky. 2019. The Bitlet Model: Defining a Litmus Test for the Bitwise Processing-in-Memory Paradigm. arxiv:1910.10234 [cs.AR].

[24]

Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. MAGIC—Memristor-Aided logic. IEEE Transactions on Circuits and Systems II: Express Briefs 61, 11 (2014), 895–899.

[25]

Shahar Kvatinsky, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. Memristor-based material implication (IMPLY) logic: Design principles and methodologies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 10 (2014), 2054–2066.

[26]

Mario Lanza, H.-S. Philip Wong, Eric Pop, Daniele Ielmini, Dimitri Strukov, Brian C. Regan, Luca Larcher, Marco A. Villena, J. Joshua Yang, Ludovic Goux, Attilio Belmonte, Yuchao Yang, Francesco M. Puglisi, Jinfeng Kang, Blanka Magyari-Köpe, Eilam Yalon, Anthony Kenyon, Mark Buckwell, Adnan Mehonic, Alexander Shluger, Haitong Li, Tuo-Hung Hou, Boris Hudec, Deji Akinwande, Ruijing Ge, Stefano Ambrogio, Juan B. Roldan, Enrique Miranda, Jordi Suñe, Kin Leong Pey, Xing Wu, Nagarajan Raghavan, Ernest Wu, Wei D. Lu, Gabriele Navarro, Weidong Zhang, Huaqiang Wu, Runwei Li, Alexander Holleitner, Ursula Wurstbauer, Max C. Lemme, Ming Liu, Shibing Long, Qi Liu, Hangbing Lv, Andrea Padovani, Paolo Pavan, Ilia Valov, Xu Jing, Tingting Han, Kaichen Zhu, Shaochuan Chen, Fei Hui, and Yuanyuan Shi. 2019. Recommended methods to study resistive switching devices. Advanced Electronic Materials 5, 1 (2019), 1800143.

[27]

E. Linn, R. Rosezin, S. Tappertzhofen, U. Böttger, and R. Waser. 2012. Beyond von Neumann—logic operations in passive crossbar arrays alongside memory operations. Nanotechnology 23, 30 (Jul 2012), 305205.

[28]

M. Oskin, F. T. Chong, and T. Sherwood. 1998. Active pages: A computation model for intelligent memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture, Barcelona, Spain. IEEE, 192–203.

[29]

Mike O’Connor, Niladrish Chatterjee, Donghyuk Lee, John Wilson, Aditya Agrawal, Stephen W. Keckler, and William J. Dally. 2017. Fine-grained DRAM: Energy-efficient DRAM for extreme bandwidth systems. In 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’17), Boston, MA. IEEE, 41–54.

[30]

D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. A case for intelligent RAM. IEEE Micro 17, 2 (1997), 34–44.

Digital Library

[31]

Ardavan Pedram, Stephen Richardson, Mark Horowitz, Sameh Galal, and Shahar Kvatinsky. 2017. Dark memory and accelerator-rich system optimization in the dark silicon era. IEEE Design Test 34, 2 (2017), 39–50.

[32]

Parthasarathy Ranganathan. 2011. From microprocessors to nanostores: Rethinking data-centric systems. Computer 44, 1 (2011), 39–48.

Digital Library

[33]

S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung, and C. H. Lam. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4.5 (2008), 465–479.

Digital Library

[34]

John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj Ali, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. 2017. Memristive logic: A framework for evaluation and comparison. In 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS’17), Thessaloniki, Greece. IEEE, 1–8.

[35]

Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A user-programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, Broomfield, CO. USENIX Association, 67–80.

[36]

Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, MA (MICRO-50’17). ACM, 273–287.

Digital Library

[37]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16), Seoul, Korea. 14–26.

Digital Library

[38]

Nishil Talati, Ameer Haj Ali, Rotem Ben Hur, Nimrod Wald, Ronny Ronen, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. 2018. Practical challenges in delivering the promises of real processing-in-memory machines. In Design, Automation and Test in Europe Conference & Exhibition (DATE’18), Dresden, Germany. IEEE, 1628–1633.

[39]

Nishil Talati, Heonjae Ha, Ben Perach, Ronny Ronen, and Shahar Kvatinsky. 2019. CONCEPT: A column-oriented memory controller for efficient memory and PIM operations in RRAM. IEEE Micro 39, 1 (2019), 33–43.

[40]

Valerio Tenace, Roberto G. Rizzo, Debjyoti Bhattacharjee, Anupam Chattopadhyay, and Andrea Calimera. 2019. SAID: A supergate-aided logic synthesis flow for memristive crossbars. In Design, Automation and Test in Europe Conference Exhibition (DATE’19), Florence, Italy. IEEE, 372–377.

[41]

Eleonora Testa, Mathias Soeken, Odysseas Zografos, Luca Amaru, Praveen Raghavan, Rudy Lauwereins, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2016. Inversion optimization in majority-inverter graphs. In IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), Beijing, China. IEEE, 15–20.

[42]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (April 2009), 65–76.

Digital Library

[43]

H.-S. Philip Wong, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming-Jinn Tsai. 2012. Metal–oxide RRAM. Proc. IEEE 100, 6 (2012), 1951–1970.

[44]

Lei Xie, Hoang Anh Du Nguyen, Mottaqiallah Taouil, Said Hamdioui, and Koen Bertels. 2015. Fast Boolean logic mapped on memristor crossbar. In 33rd IEEE International Conference on Computer Design (ICCD’15), New York, NY. IEEE, 335–342.

Digital Library

[45]

Dev Narayan Yadav, Phrangboklang L. Thangkhiew, and Kamalika Datta. 2019. Look-ahead mapping of Boolean functions in memristive crossbar array. Integration 64 (2019), 152–162.

[46]

Yue Zha, Etienne Nowak, and Jing Li. 2019. Liquid silicon: A nonvolatile fully programmable processing-in-memory processor with monolithically integrated ReRAM for big data/machine learning applications. In Symposium on VLSI Circuits, Kyoto, Japan. IEEE, C206–C207.

Cited By

Zhao YMa SLiu HHuang L(2024)EPHA: An Energy-efficient Parallel Hybrid Architecture for ANNs and SNNsACM Transactions on Design Automation of Electronic Systems10.1145/364313429:3(1-28)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3643134
Zhao YMa SLiu HHuang LDai Y(2024)SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNsACM Transactions on Architecture and Code Optimization10.1145/363295721:1(1-26)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3632957
Jha CQayyum KÇağlar Coşkun KSingh SHassan MLeupers RMerchant FDrechsler R(2024) veriSIMPLER : An Automated Formal Verification Methodology for SIMPLER MAGIC Design Style Based In-Memory Computing IEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.342468271:9(4169-4179)Online publication date: Sep-2024
https://doi.org/10.1109/TCSI.2024.3424682
Show More Cited By

Index Terms

The Bitlet Model: A Parameterized Analytical Model to Compare PIM and CPU Systems
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures

Recommendations

Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA'16

Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 18, Issue 2

April 2022

411 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3508462

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 25 March 2022

Accepted: 01 May 2021

Revised: 01 February 2021

Received: 01 August 2020

Published in JETC Volume 18, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

European Research Council through the European Union’s Horizon 2020 Research and Innovation Programme
Israel Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
540
Total Downloads

Downloads (Last 12 months)186
Downloads (Last 6 weeks)25

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao YMa SLiu HHuang L(2024)EPHA: An Energy-efficient Parallel Hybrid Architecture for ANNs and SNNsACM Transactions on Design Automation of Electronic Systems10.1145/364313429:3(1-28)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3643134
Zhao YMa SLiu HHuang LDai Y(2024)SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNsACM Transactions on Architecture and Code Optimization10.1145/363295721:1(1-26)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3632957
Jha CQayyum KÇağlar Coşkun KSingh SHassan MLeupers RMerchant FDrechsler R(2024) veriSIMPLER : An Automated Formal Verification Methodology for SIMPLER MAGIC Design Style Based In-Memory Computing IEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.342468271:9(4169-4179)Online publication date: Sep-2024
https://doi.org/10.1109/TCSI.2024.3424682
Leitersdorf ORonen RKvatinsky S(2024)PyPIM: Integrating Digital Processing-in-Memory from Microarchitectural Design to Python Tensors2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00119(1632-1647)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00119
Lyu BYang YCao YShi TChen YHuang TWen S(2024)A memristive all-inclusive hypernetwork for parallel analog deployment of full search space architecturesNeural Networks10.1016/j.neunet.2024.106312175(106312)Online publication date: Jul-2024
https://doi.org/10.1016/j.neunet.2024.106312
Shirke SJayakumar NPatil S(2024)Design and performance analysis of modern computational storage devices: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.123570250(123570)Online publication date: Sep-2024
https://doi.org/10.1016/j.eswa.2024.123570
Jiang HHuang SYu S(2024)Compute-in-Memory ArchitectureHandbook of Computer Architecture10.1007/978-981-97-9314-3_62(647-686)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_62
Mendelson A(2024)The ArchitectureHandbook of Computer Architecture10.1007/978-981-97-9314-3_1(47-88)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_1
Hong QGuan ZSun JDu S(2024)Fourier Neural Network Circuit Implementation Based on Direct Weight DeterminationIEEJ Transactions on Electrical and Electronic Engineering10.1002/tee.24230Online publication date: 28-Nov-2024
https://doi.org/10.1002/tee.24230
Thijssen SRashed MJha SEwetz R(2023)PATH: Evaluation of Boolean Logic Using Path-Based In-Memory Computing SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334452343:5(1387-1400)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3344523
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents