research-article

Open access

Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN Inference

Authors:

Seung Hun Choi,

Sung Woo ChungAuthors Info & Claims

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

Pages 1 - 6

https://doi.org/10.1145/3665314.3670825

Published: 09 September 2024 Publication History

Abstract

Exponential growth in deep neural network (DNN) model size has resulted in significant demands for memory bandwidth, leading to the extensive adoption of high bandwidth memory (HBM) in DNN inference. However, with the shorter retention time due to high operating temperature, HBM requires more frequent refresh operations, suffering larger refresh energy/performance overhead. In this paper, we propose Sparrow ECC, a lightweight but stronger HBM ECC technique for less refresh operations while preserving inference accuracy. Sparrow ECC exploits the dominant exponent pattern (i.e., value similarity) in pre-trained DNN weights, limiting the exponent value range of the pre-trained weights to prevent anomalously large weight value change due to the errors. In addition, through duplication and single error correction (SEC) code, Sparrow ECC strongly protects the critical bits in DNN weights. In our evaluation, when the proportion of 1→0 bit errors is 100% and 99%, Sparrow ECC reduces the refresh energy consumption by 90.40% and 93.22%, on average, respectively, compared to the state-of-the-art (RS(19,17)+ZEM [22]) refresh reduction technique, while preserving inference accuracy.

References

[1]

Michael Anderson, et al. 2021. First-Generation Inference Accelerator Deployment at Facebook. arXiv:2107.04140.

[2]

Hyeong Kon Bae, et al. 2023. Twin ECC: A Data Duplication Based ECC for Strong DRAM Error Resilience. In DATE.

[3]

Bita Darvish Rouhani, et al. 2020. Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point. In NeurIPS.

[4]

Jacob Devlin, et al. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.

[5]

Alexey Dosovitskiy, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.

[6]

Young-Ho Gong and Sung Woo Chung. 2016. Exploiting Refresh Effect of DRAM Read Operations: A Practical Approach to Low-Power Refresh. IEEE, Trans. Comput. 65, 5.

Digital Library

[7]

Kaiming He, et al. 2016. Deep Residual Learning for Image Recognition. In CVPR.

[8]

Wei Huang, et al. 2024. BiLLM: Pushing the Limit of Post-Training Quantization for LLMs. arXiv:2402.04291.

[9]

Myeongjae Jang, et al. 2022. ENCORE Compression: Exploiting Narrow-width Values for Quantized Deep Neural Networks. In DATE.

[10]

Norman P. Jouppi, et al. 2021. Ten Lessons From Three Generations Shaped Google's TPUv4i: Industrial Product. In ISCA.

[11]

Sehoon Kim, et al. 2023. Squeezellm: Dense-and-Sparse Quantization. arXiv:2306.07629.

[12]

Taehwan Kim, et al. 2023. Thermal Improvement of HBM with Joint Thermal Resistance Reduction for Scaling 12 Stacks and Beyond. In ECTC.

[13]

Taehwan Kim, et al. 2022. Thermal Modeling and Analysis of High Bandwidth Memory in 2.5 D Si-interposer Systems. In iTherm.

[14]

N-H Lee, et al. 2022. Transistor Reliability Characterization for Advanced DRAM with HK+MG & EUV process technology. In IRPS.

[15]

Young Seo Lee, et al. 2022. Stealth ECC: A Data-Width Aware Adaptive ECC Scheme for DRAM Error Resilience. In DATE.

[16]

Shang Li, et al. 2020. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator. IEEE Comp. Archit. Lett. 19, 2.

[17]

Yong Liu, et al. 2022. New Insight into the Aging Induced Retention Time Degraded of Advanced DRAM Technology. In IRPS.

[18]

Chi-Keung Luk, et al. 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40, 6.

Digital Library

[19]

Deepak M. Mathew, et al. 2018. An analysis on retention error behavior and power consumption of recent DDR4 DRAMs. In DATE.

[20]

Vazgen Melikyan, et al. 2018. 14nm Educational Design Kit: Capabilities, Deployment and Future. In SSSS.

[21]

Duy-Thanh Nguyen, et al. 2019. St-DRC: Stretchable DRAM Refresh Controller with No Parity-overhead Error Correction Scheme for Energy-efficient DNNs. In DAC.

[22]

Duy-Thanh Nguyen, et al. 2021. ZEM: Zero-Cycle Bit-Masking Module for Deep Learning Refresh-Less DRAM. IEEE Access 9.

[23]

Shailja Pandey, et al. 2023. NeuroCool Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching. ACM Trans. Design Autom Electr. Syst. 29, 1.

Digital Library

[24]

Myeong-Jae Park, et al. 2022. A 192-Gb 12-High 896-GB/s HBM3 DRAM With a TSV Auto-Calibration Scheme and Machine-Learning-Based Layout Optimization. IEEE J. Solid-State Circuits 58, 1.

[25]

Adam Paszke, et al. 2019. Pytorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS.

[26]

Salvatore Pontarelli, et al. 2014. Low Delay Single Symbol Error Correction Codes Based on Reed Solomon Codes. IEEE Trans. Comput. 64, 5.

[27]

I. Sofair. 2000. Probability of miscorrection for Reed-Solomon codes. In ITCC.

[28]

James E. Stine, et al. 2007. FreePDK: An Open-Source Variation-Aware Design Kit. In MSE.

[29]

Ashish Vaswani, et al. 2017. Attention is All you Need. In NeurIPS.

[30]

Yonghui Wu, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144.

[31]

Yang Zhao, et al. 2019. Memory Trojan Attack on Neural Network Accelerators. In DATE.

[32]

JEDEC. 2023. JESD238A, High Bandwidth Memory DRAM (HBM3). Retrieved from https://www.jedec.org/standards-documents/docs/jesd238a.

[33]

Nvidia. 2024. NVIDIA H100 Tensor Core GPU. Retrieved from https://www.nvidia.com/en-us/data-center/h100/.

Index Terms

Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN Inference
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Robustness
    1. Fault tolerance
      1. Error detection and error correction

Recommendations

Per-bank refresh with adaptive early termination for high density DRAM
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

DRAM, which is mainly used as main memory, requires a refresh operation to maintain the integrity of stored data. Since memory read and write operations to a bank are not allowed while the bank is being refreshed, a lot of memory accesses may be blocked ...
Frugal ECC: efficient and versatile memory error protection through fine-grained compression
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Because main memory is vulnerable to errors and failures, large-scale systems and critical servers utilize error checking and correcting (ECC) mechanisms to meet their reliability requirements. We propose a novel mechanism, Frugal ECC (FECC), that ...
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling
ASPLOS '17

DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

August 2024

384 pages

ISBN:9798400706882

DOI:10.1145/3665314

Chair:
Pascal Meinerzhagen,
Program Chair:
Kapil Dev,
Program Co-chair:
Jerald Yoo

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE EDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2024

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea
Institute of Information and Communications Technology Planning and Evaluation
Samsung Eletronics

Conference

ISLPED '24

Sponsor:

SIGDA

ISLPED '24: 29th ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

CA, Newport Beach, USA

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
297
Total Downloads

Downloads (Last 12 months)297
Downloads (Last 6 weeks)136

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents