skip to main content
10.1145/3665314.3670825acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Open access

Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN Inference

Published: 09 September 2024 Publication History

Abstract

Exponential growth in deep neural network (DNN) model size has resulted in significant demands for memory bandwidth, leading to the extensive adoption of high bandwidth memory (HBM) in DNN inference. However, with the shorter retention time due to high operating temperature, HBM requires more frequent refresh operations, suffering larger refresh energy/performance overhead. In this paper, we propose Sparrow ECC, a lightweight but stronger HBM ECC technique for less refresh operations while preserving inference accuracy. Sparrow ECC exploits the dominant exponent pattern (i.e., value similarity) in pre-trained DNN weights, limiting the exponent value range of the pre-trained weights to prevent anomalously large weight value change due to the errors. In addition, through duplication and single error correction (SEC) code, Sparrow ECC strongly protects the critical bits in DNN weights. In our evaluation, when the proportion of 1→0 bit errors is 100% and 99%, Sparrow ECC reduces the refresh energy consumption by 90.40% and 93.22%, on average, respectively, compared to the state-of-the-art (RS(19,17)+ZEM [22]) refresh reduction technique, while preserving inference accuracy.

References

[1]
Michael Anderson, et al. 2021. First-Generation Inference Accelerator Deployment at Facebook. arXiv:2107.04140.
[2]
Hyeong Kon Bae, et al. 2023. Twin ECC: A Data Duplication Based ECC for Strong DRAM Error Resilience. In DATE.
[3]
Bita Darvish Rouhani, et al. 2020. Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point. In NeurIPS.
[4]
Jacob Devlin, et al. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
[5]
Alexey Dosovitskiy, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.
[6]
Young-Ho Gong and Sung Woo Chung. 2016. Exploiting Refresh Effect of DRAM Read Operations: A Practical Approach to Low-Power Refresh. IEEE, Trans. Comput. 65, 5.
[7]
Kaiming He, et al. 2016. Deep Residual Learning for Image Recognition. In CVPR.
[8]
Wei Huang, et al. 2024. BiLLM: Pushing the Limit of Post-Training Quantization for LLMs. arXiv:2402.04291.
[9]
Myeongjae Jang, et al. 2022. ENCORE Compression: Exploiting Narrow-width Values for Quantized Deep Neural Networks. In DATE.
[10]
Norman P. Jouppi, et al. 2021. Ten Lessons From Three Generations Shaped Google's TPUv4i: Industrial Product. In ISCA.
[11]
Sehoon Kim, et al. 2023. Squeezellm: Dense-and-Sparse Quantization. arXiv:2306.07629.
[12]
Taehwan Kim, et al. 2023. Thermal Improvement of HBM with Joint Thermal Resistance Reduction for Scaling 12 Stacks and Beyond. In ECTC.
[13]
Taehwan Kim, et al. 2022. Thermal Modeling and Analysis of High Bandwidth Memory in 2.5 D Si-interposer Systems. In iTherm.
[14]
N-H Lee, et al. 2022. Transistor Reliability Characterization for Advanced DRAM with HK+MG & EUV process technology. In IRPS.
[15]
Young Seo Lee, et al. 2022. Stealth ECC: A Data-Width Aware Adaptive ECC Scheme for DRAM Error Resilience. In DATE.
[16]
Shang Li, et al. 2020. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator. IEEE Comp. Archit. Lett. 19, 2.
[17]
Yong Liu, et al. 2022. New Insight into the Aging Induced Retention Time Degraded of Advanced DRAM Technology. In IRPS.
[18]
Chi-Keung Luk, et al. 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40, 6.
[19]
Deepak M. Mathew, et al. 2018. An analysis on retention error behavior and power consumption of recent DDR4 DRAMs. In DATE.
[20]
Vazgen Melikyan, et al. 2018. 14nm Educational Design Kit: Capabilities, Deployment and Future. In SSSS.
[21]
Duy-Thanh Nguyen, et al. 2019. St-DRC: Stretchable DRAM Refresh Controller with No Parity-overhead Error Correction Scheme for Energy-efficient DNNs. In DAC.
[22]
Duy-Thanh Nguyen, et al. 2021. ZEM: Zero-Cycle Bit-Masking Module for Deep Learning Refresh-Less DRAM. IEEE Access 9.
[23]
Shailja Pandey, et al. 2023. NeuroCool Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching. ACM Trans. Design Autom Electr. Syst. 29, 1.
[24]
Myeong-Jae Park, et al. 2022. A 192-Gb 12-High 896-GB/s HBM3 DRAM With a TSV Auto-Calibration Scheme and Machine-Learning-Based Layout Optimization. IEEE J. Solid-State Circuits 58, 1.
[25]
Adam Paszke, et al. 2019. Pytorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS.
[26]
Salvatore Pontarelli, et al. 2014. Low Delay Single Symbol Error Correction Codes Based on Reed Solomon Codes. IEEE Trans. Comput. 64, 5.
[27]
I. Sofair. 2000. Probability of miscorrection for Reed-Solomon codes. In ITCC.
[28]
James E. Stine, et al. 2007. FreePDK: An Open-Source Variation-Aware Design Kit. In MSE.
[29]
Ashish Vaswani, et al. 2017. Attention is All you Need. In NeurIPS.
[30]
Yonghui Wu, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144.
[31]
Yang Zhao, et al. 2019. Memory Trojan Attack on Neural Network Accelerators. In DATE.
[32]
JEDEC. 2023. JESD238A, High Bandwidth Memory DRAM (HBM3). Retrieved from https://www.jedec.org/standards-documents/docs/jesd238a.
[33]
Nvidia. 2024. NVIDIA H100 Tensor Core GPU. Retrieved from https://www.nvidia.com/en-us/data-center/h100/.

Index Terms

  1. Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN Inference

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
      August 2024
      384 pages
      ISBN:9798400706882
      DOI:10.1145/3665314
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 September 2024

      Check for updates

      Badges

      • Best Paper

      Author Tags

      1. deep neural networks
      2. DRAM refresh
      3. ECC
      4. energy efficiency

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ISLPED '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 398 of 1,159 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 297
        Total Downloads
      • Downloads (Last 12 months)297
      • Downloads (Last 6 weeks)136
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media