research-article

Accelerating On-Chip Training with Ferroelectric-Based Hybrid Precision Synapse

Authors:

Shimeng YuAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 18, Issue 2

Article No.: 35, Pages 1 - 20

https://doi.org/10.1145/3473461

Published: 12 January 2022 Publication History

Abstract

In this article, we propose a hardware accelerator design using ferroelectric transistor (FeFET)-based hybrid precision synapse (HPS) for deep neural network (DNN) on-chip training. The drain erase scheme for FeFET programming is incorporated for both FeFET HPS design and FeFET buffer design. By using drain erase, high-density FeFET buffers can be integrated onchip to store the intermediate input-output activations and gradients, which reduces the energy consuming off-chip DRAM access. Architectural evaluation results show that the energy efficiency could be improved by 1.2× ∼ 2.1×, 3.9× ∼ 6.0× compared to the other HPS-based designs and emerging non-volatile memory baselines, respectively. The chip area is reduced by 19% ∼ 36% compared with designs using SRAM on-chip buffer even though the capacity of FeFET buffer is increased. Besides, by utilizing drain erase scheme for FeFET programming, the chip area is reduced by 11% ∼ 28.5% compared with the designs using body erase scheme.

References

[1]

Cheng-Xin Xue et al. 2019. A 1Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN-based AI edge processors. In 2019 IEEE International Solid-State Circuits Conference (ISSCC’19). IEEE, 388–390. DOI:https://doi.org/10.1109/ISSCC.2019.8662395

[2]

Sapan Agarwal et al. 2016. Resistive memory device requirements for a neural algorithm accelerator. In 2016 International Joint Conference on Neural Networks (IJCNN’16). ACM, New York, 929–938. DOI:https://doi.org/10.1109/IJCNN.2016.7727298

[3]

Stefano Ambrogio et al. 2018. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 7708 (June 2018), 60–76. DOI:https://doi.org/10.1038/s41586-018-0180-5

[4]

Yandong Luo and Shimeng Yu. 2020. Accelerating deep neural network in-situ training with non-volatile and volatile memory based hybrid precision synapses. IEEE Transactions on Computers 69, 8 (Aug. 2020), 1113–1127. DOI:https://doi.org/10.1109/TC.2020.3000218

Digital Library

[5]

Arman Kazemi, Ramin Rajaei, Kai Ni, Suman Datta, Michael Niemier, and Sharon Hu. 2020. A hybrid FeMFET-CMOS analog synapse circuit for neural network training and inference. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). IEEE, 1–5. DOI:https://doi.org/10.1109/ISCAS45731.2020.9180722

[6]

Stefan Dünkel et al. 2017. A FeFET-based super-low-power ultra-fast embedded NVM technology for 22nm FDSOI and beyond. In 2017 IEEE International Electron Devices Meeting (IEDM’17). IEEE, 19.7.1–19.7.4. DOI:https://doi.org/10.1109/IEDM.2017.8268425

[7]

Martin Trentzsch et al. 2016. A 28 nm HKMG super low power embedded NVM technology based on ferroelectric FETs. In 2016 IEEE International Electron Devices Meeting (IEDM’16). IEEE, 11.5.1–11.5.4. DOI:https://doi.org/10.1109/IEDM.2016.7838397

[8]

J. Y. Wu et al. 2018. A 40nm low-power logic compatible phase change memory technology. In 2018 IEEE International Electron Devices Meeting (IEDM’18). IEEE, 27.6.1–27.6.4. DOI:https://doi.org/10.1109/IEDM.2018.8614513

[9]

Oleg Golonzka et al. 2019. Non-volatile RRAM embedded into 22FFL FinFET technology. In 2019 Symposium on VLSI Technology (VLSI’19). IEEE, T230–T231. DOI:https://doi.org/10.23919/VLSIT.2019.8776570

[10]

Oleg Golonzka et al. 2018. MRAM as embedded non-volatile memory solution for 22FFL FinFET technology. In 2018 IEEE International Electron Devices Meeting (IEDM’18). IEEE, 18.1.1–18.1.4. DOI:https://doi.org/10.1109/IEDM.2018.8614620

[11]

Yun Long et al. 2018. A ferroelectric FET based power-efficient architecture for data-intensive computing. In 2018 International Conference on Computer-Aided Design (ICCAD’18). ACM, New York, Article 32, 1–8. DOI:https://doi.org/10.1145/3240765.3240770

Digital Library

[12]

Xiaoyu Sun, Panni Wang, Kai Ni, Suman Datta, and Shimeng Yu. 2018. Exploiting hybrid precision for training and inference: A 2T-1FeFET based analog synaptic weight cell. In 2018 IEEE International Electron Devices Meeting (IEDM’18). IEEE, 3.1.1–3.1.4. DOI:https://doi.org/10.1109/IEDM.2018.8614611

[13]

Yandong Luo, Panni Wang, Xiaocheng Peng, Xiaoyu Sun, and Shimeng Yu. 2019. Benchmark of ferroelectric transistor-based hybrid precision synapse for neural network accelerator. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 5, 2 (Dec. 2019), 142–150. DOI:https://doi.org/10.1109/JXCDC.2019.2925061

[14]

Subramanian Iyer, John Barth, Paul Parries, James Norum, James Rice, Lyndon Logan, and Dennis Hoyniak. 2005. Embedded DRAM: Technology platform for the Blue Gene/L chip. IBM Journal of Research and Development 49, 2.3 (Mar. 2005), 333–350. DOI:https://doi.org/10.1147/rd.492.0333

Digital Library

[15]

Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, and Shaojun Wei. 2018. RANA: Towards efficient neural acceleration with refresh-optimized embedded DRAM. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, 340–352. DOI:https://doi.org/10.1109/ISCA.2018.00037

Digital Library

[16]

Ming Cheng, Lixue Xia, Zhenhua Zhu, Yi Cai, Yuan Xie, Yu Wang, and Huazhong Yang. 2019. TIME: A training-in-memory architecture for RRAM-based deep neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 5 (May 2019), 834–847. DOI:https://doi.org/10.1109/TCAD.2018.2824304

[17]

Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined ReRAM-based accelerator for deep learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 541–552. DOI:https://doi.org/10.1109/HPCA.2017.55

[18]

Dayane Reis, Kai Ni, Wriddhi Chakraborty, Xunzhao Yin, Martin Trentzsch, Stefan Dünkel, Thomas Melde, et al. 2019. Design and analysis of an ultra-dense, low-leakage, and fast FeFET-based random access memory array. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 5, 2 (Dec. 2019), 103–112. DOI:https://doi.org/10.1109/JXCDC.2019.2930284

[19]

Panni Wang, Zheng Wang, Wonbo Shim, Jae Hur, Suman Datta, Asif Islam Khan, and Shimeng Yu. 2020. Drain-erase scheme in ferroelectric field-effect transistor—Part I: Device characterization. IEEE Transactions on Electron Devices 67, 3 (Mar. 2020), 955–961. DOI:https://doi.org/10.1109/TED.2020.2969401

[20]

Panni Wang, Zheng Wang, Xiaoyu Sun, Jae Hur, Suman Datta, Asif Islam Khan, and Shimeng. Yu. 2020. Investigating ferroelectric minor loop dynamics and history effect—Part I: Device characterization. IEEE Trans. Electron Devices 67, 9 (Sept. 2020), 3592–3597. DOI:https://doi.org/10.1109/TED.2020.3009623

[21]

Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: a comprehensive survey. Proceedings of the IEEE 108, 4 (Apr. 2020), 485–532. DOI:https://doi.org/10.1109/JPROC.2020.2976475

[22]

Xiaochen Peng, Rui Liu, and Shimeng Yu. 2020. Optimizing weight mapping and data flow for convolutional neural networks on processing-in-memory architectures. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 4 (Apr. 2019), 1333–1343. DOI:https://doi.org/10.1109/TCSI.2019.2958568

[23]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, 269–284. DOI:https://doi.org/10.1145/2541940.2541967

Digital Library

[24]

Xiaochen Peng, Shanshi Huang, Yandong Luo, Xiaoyu Sun, and Shimeng Yu. 2019. DNN+NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE International Electron Devices Meeting (IEDM’19). IEEE, 32.5.1–32.5.4. DOI:https://doi.org/10.1109/IEDM19573.2019.8993491

[25]

Shihui Yin, Yulhwa Kim, Xu Han, Hugh Barnaby, Shimeng Yu, Yandong Luo, Wangxin He, Xiaoyu Sun, Jae-Joon Kim, and Jae-sun Seo. 2019. Monolithically integrated RRAM-and CMOS-based in-memory computing optimizations for efficient deep learning. IEEE Micro 39, 6 (Nov. 2019), 54–63. DOI:https://doi.org/10.1109/MM.2019.2943047

[26]

Yoon-Jong Song et al. 2016. Highly functional and reliable 8Mb STT-MRAM embedded in 28nm logic. In 2016 IEEE International Electron Devices Meeting (IEDM’16). IEEE, 27.2.1–27.2.4. DOI:https://doi.org/10.1109/IEDM.2016.7838491

[27]

Young-Ho Kim et al. 2011. Integration of 28nm MJT for 8∼16Gb level MRAM with full investigation of thermal stability. In 2011 Symposium on VLSI Technology (VLSI’11). IEEE, 210–211.

[28]

Hongwu Jiang, Xiaochen Peng, Shanshi Huang, and Shimeng Yu. 2020. CIMAT: A compute-in-memory architecture for on-chip training based on transpose SRAM arrays. IEEE Transactions on Computers 69, 7 (July 2020), 944–954. DOI:https://doi.org/10.1109/TC.2020.2980533

Digital Library

[29]

Matthew Jerry, Pai-Yu Chen, Jianchi Zhang, Pankaj Sharma, Kai Ni, Shimeng Yu, and Suman Datta. 2017. Ferroelectric FET analog synapse for acceleration of deep neural network training. 2017 IEEE International Electron Devices Meeting (IEDM’17). IEEE, 6.2.1–6.2.4. DOI:https://doi.org/10.1109/IEDM.2017.8268338

[30]

Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and inference with integers in deep neural networks. In International Conference on Learning Representations (ICLR’18).

[31]

A. A. Sharma et al. 2020. High speed memory operation in channel-last, back-gated ferroelectric transistors. 2020 IEEE International Electron Devices Meeting (IEDM’20). IEEE, 18.5.1–18.5.4. DOI:https://doi.org/10.1109/IEDM13553.2020.9371940

Cited By

Noh KKwak HSon JKim SUm MKang MKim DJi WLee JJo HWoo JLee HKim S(2024)Retention-aware zero-shifting technique for Tiki-Taka algorithm-based analog deep learning acceleratorScience Advances10.1126/sciadv.adl335010:24Online publication date: 14-Jun-2024
https://doi.org/10.1126/sciadv.adl3350
Jung SLee JPark DLee YYoon JKung J(2024)A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory ArchitectureIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.339584271:12(5546-5559)Online publication date: Dec-2024
https://doi.org/10.1109/TCSI.2024.3395842
Park MHwang JKim SShin WShim WBae JLee JCho S(2024)Charge-trap synaptic device with polycrystalline silicon channel for low power in-memory computingScientific Reports10.1038/s41598-024-80272-x14:1Online publication date: 23-Nov-2024
https://doi.org/10.1038/s41598-024-80272-x
Show More Cited By

Index Terms

Accelerating On-Chip Training with Ferroelectric-Based Hybrid Precision Synapse
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Emerging technologies

Recommendations

Nonvolatile memory design based on ferroelectric FETs
DAC '16: Proceedings of the 53rd Annual Design Automation Conference

Ferroelectric FETs (FEFETs) offer intriguing possibilities for the design of low power nonvolatile memories by virtue of their three-terminal structure coupled with the ability of the ferroelectric (FE) material to retain its polarization in the absence ...
Ternary compute-enabled memory using ferroelectric transistors for accelerating deep neural networks
DATE '20: Proceedings of the 23rd Conference on Design, Automation and Test in Europe

Ternary Deep Neural Networks (DNNs), which employ ternary precision for weights and activations, have recently been shown to attain accuracies close to full-precision DNNs, raising interest in their efficient hardware realization. In this work we propose ...
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU
DAC '12: Proceedings of the 49th Annual Design Automation Conference

Single-chip CPU/GPU architecture is being adopted in high-end (embedded) systems, e.g., smartphones and tablet PCs. Main memory subsystem is expected to consist of hybrid DRAM and phase-change RAM (PRAM) due to the difficulties in DRAM scaling. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 18, Issue 2

April 2022

411 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3508462

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 12 January 2022

Accepted: 01 June 2021

Revised: 01 March 2021

Received: 01 July 2020

Published in JETC Volume 18, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

ASCENT
SRC/DRAPA JUMP
SONY

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
955
Total Downloads

Downloads (Last 12 months)189
Downloads (Last 6 weeks)16

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Noh KKwak HSon JKim SUm MKang MKim DJi WLee JJo HWoo JLee HKim S(2024)Retention-aware zero-shifting technique for Tiki-Taka algorithm-based analog deep learning acceleratorScience Advances10.1126/sciadv.adl335010:24Online publication date: 14-Jun-2024
https://doi.org/10.1126/sciadv.adl3350
Jung SLee JPark DLee YYoon JKung J(2024)A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory ArchitectureIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.339584271:12(5546-5559)Online publication date: Dec-2024
https://doi.org/10.1109/TCSI.2024.3395842
Park MHwang JKim SShin WShim WBae JLee JCho S(2024)Charge-trap synaptic device with polycrystalline silicon channel for low power in-memory computingScientific Reports10.1038/s41598-024-80272-x14:1Online publication date: 23-Nov-2024
https://doi.org/10.1038/s41598-024-80272-x
Youn SHwang YKim TKim SHwang HPark JKim H(2024)Threshold learning algorithm for memristive neural network with binary switching behaviorNeural Networks10.1016/j.neunet.2024.106355176(106355)Online publication date: Aug-2024
https://doi.org/10.1016/j.neunet.2024.106355
Fan AFu YTao YJin ZHan HLiu HZhang YYan BYang YHuang R(2023)Hadamard product-based in-memory computing design for floating point neural network trainingNeuromorphic Computing and Engineering10.1088/2634-4386/acbab93:1(014009)Online publication date: 24-Feb-2023
https://doi.org/10.1088/2634-4386/acbab9

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents