skip to main content
10.1145/3400302.3415664acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Unlocking wordline-level parallelism for fast inference on RRAM-based DNN accelerator

Published: 17 December 2020 Publication History

Abstract

In-memory computing is rapidly rising as a viable solution that can effectively accelerate neural networks by overcoming the memory wall. Resistive RAM (RRAM) crossbar array is in the spotlight as a building block for DNN inference accelerators since it can perform a massive amount of dot products in memory in an area- and power-efficient manner. However, its in-memory computation is vulnerable to errors due to the non-ideality of RRAM cells. This error-prone nature of RRAM crossbar limits its wordline-level parallelism as activating a large number of wordlines accumulates non-zero current contributions from RRAM cells in the high-resistance state as well as current deviations from individual cells, leading to a significant accuracy drop. To improve performance by increasing the maximum number of concurrently activated wordlines, we propose two techniques. First, we introduce a lightweight scheme that effectively eliminates the current contributions from high-resistance state cells. Second, based on the observation that not all layers in a neural network model have the same error rates and impact on the inference accuracy, we propose to allow different layers to activate non-uniform numbers of wordlines concurrently. We also introduce a systematic methodology to determine the number of concurrently activated wordlines for each layer with a goal of optimizing performance, while minimizing the accuracy degradation. Our proposed techniques increase the inference throughput by 3-10× with a less than 1% accuracy drop over three datasets. Our evaluation also demonstrates that this benefit comes with a small cost of only 8.2% and 5.3% increase in area and power consumption, respectively.

References

[1]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W Hwu, John Paul Strachan, Kaushik Roy, and Dejan S. Milojicic. 2019. PUMA: A Programmable Ultra-Efficient Memristor-Based Accelerator for Machine Learning Inference. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, 715--731.
[2]
W. Chen, K. Li, W. Lin, K. Hsu, P. Li, C. Yang, C. Xue, E. Yang, Y. Chen, Y. Chang, T. Hsu, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In 2018 IEEE International Solid - State Circuits Conference - (ISSCC). 494--496.
[3]
Wei-Hao Chen, Chunmeng Dou, Kai-Xiang Li, Wei-Yu Lin, Pin-Yi Li, Jian-Hao Huang, Jing-Hong Wang, Wei-Chen Wei, Cheng-Xin Xue, Yen-Cheng Chiu, Frederick Chen, Chorng-Jung Lin, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, J. Yang, Mon-Shu Ho, and Meng-Fan Chang. 2019. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nature Electronics 2 (08 2019).
[4]
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Association for Computing Machinery, 27--39.
[5]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized Clipping Activation for Quantized Neural Networks.
[6]
Teyuh Chou, Wei Tang, Jacob Botimer, and Zhengya Zhang. 2019. CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '52). Association for Computing Machinery, 114--125.
[7]
Chun-Cheng Liu, Yi-Ting Huang, Guan-Ying Huang, Soon-Jyh Chang, Chung-Ming Huang, and Chih-Haur Huang. 2009. A 6-bit 220-MS/s time-interleaving SAR ADC in 0.18-μm digital CMOS process. In 2009 International Symposium on VLSI Design, Automation and Test. 215--218.
[8]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.
[9]
Daniele Garbin, E. Vianello, Olivier Bichler, Quentin Rafhay, Christian Gamrat, Gerard Ghibaudo, Barbara DeSalvo, and Luca Perniola. 2015. HfO<sub>2</sub>-Based OxRAM Devices as Synapses for Convolutional Neural Networks. Electron Devices, IEEE Transactions on 62 (2015).
[10]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[11]
C. Ho, S. Chang, C. Huang, Y. Chuang, S. Lim, M. Hsieh, S. Chang, and H. Liao. 2017. Integrated HfO2-RRAM to achieve highly reliable, greener, faster, cost-effective, and scaled devices. In 2017 IEEE International Electron Devices Meeting (IEDM). IEEE, 2.6.1--2.6.4.
[12]
K.C. Hsu, Feng-Min Lee, Y.Y. Lin, E.K. Lai, J.Y. Wu, D.Y. Lee, Min-Hee Lee, H.-L Lung, K.Y. Hsieh, and C.Y. Lu. 2015. A Study of Array Resistance Distribution and a Novel Operation Algorithm for WOx ReRAM Memory. In Proceedings of International Conference on Solid State Devices and Materials.
[13]
M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1--6.
[14]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. J. Mach. Learn. Res. 18, 1 (Jan. 2017), 6869--6898.
[15]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images.
[16]
Lukas Kull, Thomas Toifl, Martin L. Schmatz, Pier Andrea Francese, Christian Menolfi, Matthias Braendli, Marcel A. Kossel, Thomas Morf, Toke Meyer Andersen, and Yusuf Leblebici. 2013. A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS. IEEE Journal of Solid-State Circuits 48 (2013), 3049--3058.
[17]
Lukas Kull, Thomas Toifl, Martin L. Schmatz, Pier Andrea Francese, Christian Menolfi, Matthias Braendli, Marcel A. Kossel, Thomas Morf, Toke Meyer Andersen, and Yusuf Leblebici. 2013. A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS. IEEE Journal of Solid-State Circuits 48 (2013), 3049--3058.
[18]
Ya Le and Xuan Yang. 2015. Tiny ImageNet Visual Recognition Challenge. https://tiny-imagenet.herokuapp.com/.
[19]
M. Lin, H. Cheng, W. Lin, T. Yang, I. Tseng, C. Yang, H. Hu, H. Chang, H. Li, and M. Chang. 2018. DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-based Accelerators for Deep Learning. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1--8.
[20]
Dilip Maiti, Sudipto Debnath, Sk Masum Nawaz, Bapi Dey, Enakhi Dinda, Dipanwita Roy, Sudipta Ray, A. Mallik, and Syed Arshad Hussain. 2017. Composition-dependent nanoelectronics of amido-phenazines: non-volatile RRAM and WORM memory devices. Scientific Reports 7 (12 2017).
[21]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. Curran Associates, Inc.
[22]
M. Saberi, R. Lotfi, K. Mafinezhad, and W. A. Serdijn. 2011. Analysis of Power Consumption and Linearity in Capacitive Digital-to-Analog Converters Used in Successive Approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 1736--1748.
[23]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Association for Computing Machinery, 14--26.
[24]
S. Shin, K. Hwang, and W. Sung. 2016. Fixed-point performance analysis of recurrent neural networks. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 976--980.
[25]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.
[26]
L. Song, X. Qian, H. Li, and Y. Chen. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 541--552.
[27]
F. Su, W. Chen, L. Xia, C. Lo, T. Tang, Z. Wang, K. Hsu, M. Cheng, J. Li, Y. Xie, Y. Wang, M. Chang, H. Yang, and Y. Liu. 2017. A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory. In 2017 Symposium on VLSI Technology. T260--T261.
[28]
Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. 2015. Resiliency of Deep Neural Networks under Quantization. ArXiv abs/1511.06488 (2015).
[29]
Yi-Hsin Ting, Jui-Yuan Chen, Chun-Wei Huang, Ting-Kai Huang, Cheng-Yu Hsieh, and Wen-Wei Wu. 2017. Observation of Resistive Switching Behavior in Crossbar Core-Shell Ni/NiO Nanowires Memristor. Small 14 (12 2017).
[30]
Hong Wang and Xiaobing Yan. 2019. Overview of Resistive Random Access Memory (RRAM): Materials, Filament Mechanisms, Performance Optimization, and Prospects. physica status solidi (RRL) - Rapid Research Letters 13, 9 (2019), 1900073.
[31]
C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Chang, T. Chang, T. Huang, H. Kao, S. Wei, Y. Chiu, C. Lee, C. Lo, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2019. 24.1 A 1Mb Multibit ReRAM ComputingIn-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC). 388--390.
[32]
C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Huang, T. Chang, T. Chang, H. Kao, Y. Chiu, C. Lee, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2020. Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors. IEEE Journal of Solid-State Circuits 55, 1 (2020), 203--215.
[33]
C. Xue, T. Huang, J. Liu, T. Chang, H. Kao, J. Wang, T. Liu, S. Wei, S. Huang, W. Wei, Y. Chen, T. Hsu, Y. Chen, Y. Lo, T. Wen, C. Lo, R. Liu, C. Hsieh, K. Tang, and M. Chang. 2020. 15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices. In 2020 IEEE International Solid- State Circuits Conference - (ISSCC). 244--246.
[34]
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, and Hsiang-Pang Li. 2019. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks. In Proceedings of the 46th International Symposium on Computer Architecture. Association for Computing Machinery, 236--249.
[35]
Shihui Yin, Xiaoyu Sun, Shimeng Yu, and Jae sun Seo. 2019. High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS. ArXiv abs/1909.07514 (2019).
[36]
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. ArXiv abs/1606.06160 (2016).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
November 2020
1396 pages
ISBN:9781450380263
DOI:10.1145/3400302
  • General Chair:
  • Yuan Xie
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CAS
  • IEEE CEDA
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Ministry of Science, ICT and Future Planning

Conference

ICCAD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Upcoming Conference

ICCAD '24
IEEE/ACM International Conference on Computer-Aided Design
October 27 - 31, 2024
New York , NY , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)9
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media