research-article

NeuroCool: Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching

Authors:

Shailja Pandey,

Preeti Ranjan PandaAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 29, Issue 1

Article No.: 19, Pages 1 - 35

https://doi.org/10.1145/3630012

Published: 18 December 2023 Publication History

Abstract

Deep neural network (DNN) implementations are typically characterized by huge datasets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores/edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under-utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling.

We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed NeuroCool DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test datasets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.

References

[1]

Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2015. Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 2 (2015), 453–464.

Digital Library

[2]

Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. 2022. Restoring and attributing ancient texts using deep neural networks. Nature 603, 7900 (2022), 280–283.

[3]

Raid Ayoub, Rajib Nath, and Tajana Simunic Rosing. 2013. CoMETC: Coordinated management of energy/thermal/ cooling in servers. ACM Transactions on Design Automation of Electronic Systems 19, 1 (2013), 1–28.

Digital Library

[4]

Min Bao, Alexandru Andrei, Petru Eles, and Zebo Peng. 2009. On-line thermal aware dynamic voltage scaling for energy optimization with frequency/temperature dependency consideration. In Proceedings of the 2009 46th Annual ACM/IEEE Design Automation Conference (DAC’09). 490–495.

Digital Library

[5]

Majed Valad Beigi and Gokhan Memik. 2018. Thermal-aware optimizations of ReRAM-based neuromorphic computing systems. In Proceedings of the 2018 55th Annual Design Automation Conference(DAC’18). 1–6.

Digital Library

[6]

Majed Valad Beigi and Gokhan Memik. 2018. THOR: Thermal-aware optimizations for extending ReRAM lifetime. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). IEEE, Los Alamitos, CA, 670–679.

[7]

Rahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. 2021. Pythia: A customizable hardware prefetching framework using online reinforcement learning. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21). 1121–1137.

Digital Library

[8]

Paul Bogdan, Partha Pratim Pande, Hussam Amrouch, Muhammad Shafique, and Jörg Henkel. 2016. Power and thermal management in massive multicore chips: Theoretical foundation meets architectural innovation and resource allocation. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. 1–2.

Digital Library

[9]

Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization 11, 3 (Aug. 2014), Article 28, 25 pages. DOI:

Digital Library

[10]

Christine S. Chan, Alper Sinan Akyürek, Baris Aksanli, and Tajana Šimunić Rosing. 2018. Optimal performance-aware cooling on enterprise servers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 9 (2018), 1689–1702.

Digital Library

[11]

Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proceedings of the 2012 Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). 33–38. DOI:

[12]

Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2016), 127–138.

[13]

Zeshan Chishti and Berkin Akin. 2019. Memory system characterization of deep learning workloads. In Proceedings of the International Symposium on Memory Systems. 497–505.

Digital Library

[14]

Ryan Cochran and Sherief Reda. 2013. Thermal prediction and adaptive control through workload phase detection. ACM Transactions on Design Automation of Electronic Systems 18, 1 (2013), 1–19.

Digital Library

[15]

Intel Corporation. 2020. Intel® Architecture Instruction Set Extensions and Future Features: Programming Reference. Intel Corporation.

[16]

Jeff Dean. 2017. Machine learning for systems and systems for machine learning. In Proceedings of the 2017 Conference on Neural Information Processing Systems.

[17]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[18]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625–2634.

[19]

Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In Proceedings of the 2018 ACM/IEEE 45Th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, Los Alamitos, CA, 383–396.

Digital Library

[20]

Amin Farmahini-Farahani, Sudhanva Gurumurthi, Gabriel Loh, and Michael Ignatowski. 2018. Challenges of high-capacity DRAM stacks and potential directions. In Proceedings of the Workshop on Memory Centric High Performance Computing. 4–13.

Digital Library

[21]

John W. C. Fu, Janak H. Patel, and Bob L. Janssens. 1992. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter 23, 1-2 (1992), 102–110.

Digital Library

[22]

Vinod Ganesan, Sanchari Sen, Pratyush Kumar, Neel Gala, Kamakoti Veezhinathan, and Anand Raghunathan. 2020. Sparsity-aware caches to accelerate deep neural networks. In Proceedings of the 2020 Design, Automation, and Test in Europe Conference and Exhibition (DATE’20). IEEE, Los Alamitos, CA, 85–90.

[23]

Pedram Ghazi, Antti P. Happonen, Jani Boutellier, and Heikki Huttunen. 2018. Embedded implementation of a deep learning smile detector. In Proceedings of the 2018 7th European Workshop on Visual Information Processing (EUVIP’18). IEEE, Los Alamitos, CA, 1–6.

[24]

Takeshi Hamamoto, Soichi Sugiura, and Shizuo Sawada. 1998. On the retention time distribution of dynamic random access memory (DRAM). IEEE Transactions on Electron Devices 45, 6 (1998), 1300–1309.

[25]

Fazal Hameed, Mohammad Abdullah Al Faruque, and Jörg Henkel. 2011. Dynamic thermal management in 3D multi-core architecture through run-time adaptation. In Proceedings of the 2011 Design, Automation, and Test in Europe Conference and Exhibition (DATE’11). 1–6.

[26]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, Los Alamitos, CA, 620–629.

[27]

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 558–567.

[28]

Jörg Henkel, Heba Khdr, and Martin Rapp. 2019. Smart thermal management for heterogeneous multicores. In Proceedings of the 2019 Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 132–137.

[29]

Ahmet Inci, Mehmet Meric Isgenc, and Diana Marculescu. 2021. DeepNVM++: Cross-layer modeling and optimization framework of nonvolatile memories for deep learning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 10 (2021), 3426–3437.

Digital Library

[30]

Intel. 2022. Intel Max Series Brings Breakthrough Memory Bandwidth and Performance to HPC and AI. Retrieved October 31, 2023 from https://www.intel.com/content/www/us/en/newsroom/news/introducing-intel-max-series-product-family.html

[31]

Arman Iranfar, Federico Terraneo, Gabor Csordas, Marina Zapater, William Fornaciari, and David Atienza. 2020. Dynamic thermal management with proactive fan speed control through reinforcement learning. In Proceedings of the 2020 Design, Automation, and Test in Europe Conference and Exhibition (DATE’20). 418–423.

[32]

Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, Los Alamitos, CA, 87–88.

[33]

JEDEC. 2022. JEDEC Standard High Bandwidth Memory DRAM (HBM3), JESD238. Retrieved October 31, 2023 from https://www.jedec.org/standards-documents/docs/jesd238

[34]

Víctor Jiménez, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Pradip Bose, Francis P. O’Connell, and Bruce G. Mealey. 2014. Adaptive prefetching on POWER7: Improving performance and power consumption. ACM Transactions on Parallel Computing 1, 1 (2014), 1–25.

Digital Library

[35]

Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. HBM (high bandwidth memory) DRAM technology and architecture. In Proceedings of the 2017 IEEE International Memory Workshop (IMW’17). IEEE, Los Alamitos, CA, 1–4.

[36]

Soheil Khadirsharbiyani, Jagadish Kotra, Karthik Rao, and Mahmut Kandemir. 2022. Data convection: A GPU-driven case study for thermal-aware data placement in 3D DRAMs. Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, 1 (2022), 1–25.

Digital Library

[37]

Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, et al. 2023. Full stack optimization of transformer inference: A survey. arXiv preprint arXiv:2302.14017 (2023).

[38]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.

Digital Library

[39]

Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs. arXiv preprint arXiv:1801.06601 (2018).

[40]

Dongjin Lee, Sourav Das, Janardhan Rao Doppa, Partha Pratim Pande, and Krishnendu Chakrabarty. 2018. Performance and thermal tradeoffs for energy-efficient monolithic 3D network-on-chip. ACM Transactions on Design Automation of Electronic Systems 23, 5 (2018), 1–25.

Digital Library

[41]

Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. 2012. When prefetching works, when it doesn’t, and why. ACM Transactions on Architecture and Code Optimization 9, 1 (2012), 1–29.

Digital Library

[42]

Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, and Huiyang Zhou. 2016. Optimizing memory efficiency for deep convolutional neural networks on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’16). IEEE, Los Alamitos, CA, 633–644.

[43]

Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu, and Xiaowei Li. 2018. SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators. In Proceedings of the 2018 Design, Automation, and Test in Europe Conference and Exhibition (DATE’18). 343–348.

[44]

Chien-Hui Liao, Charles H.-P. Wen, and Krishnendu Chakrabarty. 2015. An online thermal-constrained task scheduler for 3D multi-core processors. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 351–356.

[45]

Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine learning-based prefetch optimization for data center applications. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis. 1–10.

Digital Library

[46]

Christianto C. Liu, Ilya Ganusov, Martin Burtscher, and Sandip Tiwari. 2005. Bridging the processor-memory performance gap with 3D IC technology. IEEE Design & Test of Computers 22, 6 (2005), 556–564.

Digital Library

[47]

Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms. ACM SIGARCH Computer Architecture News 41, 3 (2013), 60–71.

Digital Library

[48]

Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. ACM SIGARCH Computer Architecture News 40, 3 (2012), 1–12.

Digital Library

[49]

Xiao Liu, Mingxuan Zhou, Tajana S. Rosing, and Jishen Zhao. 2019. HR 3 AM: A heat resilient design for RRAM-based neuromorphic computing. In Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’19). IEEE, Los Alamitos, CA, 1–6.

[50]

Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN model inference on CPUs. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19). 1025–1040.

[51]

Wei-Hen Lo, Kai-Zen Liang, and TingTing Hwang. 2016. Thermal-aware dynamic page allocation policy by future access patterns for Hybrid Memory Cube (HMC). In Proceedings of the 2016 Design, Automation, and Test in Europe Conference and Exhibition (DATE’16). 1084–1089.

Digital Library

[52]

Yanchao Lu, Donghong Wu, Bingsheng He, Xueyan Tang, Jianliang Xu, and Minyi Guo. 2015. Rank-aware dynamic migrations and adaptive demotions for DRAM power management. IEEE Transactions on Computers 65, 1 (2015), 187–202.

Digital Library

[53]

Sally A. McKee. 2004. Reflections on the memory wall. In Proceedings of the 1st Conference on Computing Frontiers. 162.

Digital Library

[54]

Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, and Luca Benini. 2018. NEURAghe: Exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on Zynq SoCs. ACM Transactions on Reconfigurable Technology and Systems 11, 3 (2018), 1–24.

Digital Library

[55]

Paolo Meloni, Daniela Loi, Paola Busia, Gianfranco Deriu, Andy D. Pimentel, Dolly Sapra, Todor Stefanov, Svetlana Minakova, Francesco Conti, Luca Benini, et al. 2019. Optimization and deployment of CNNs at the edge: The ALOHA experience. In Proceedings of the 16th ACM International Conference on Computing Frontiers (CF’19). ACM, New York, NY, 326–332. DOI:

Digital Library

[56]

Jie Meng, Katsutoshi Kawakami, and Ayse K. Coskun. 2012. Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints. In Proceedings of the 2012 49th Annual Design Automation Conference (DAC’12). 648–655.

Digital Library

[57]

Carlos Navarro, Josué Feliu, Salvador Petit, Maria E. Gomez, and Julio Sahuquillo. 2020. Bandwidth-aware dynamic prefetch configuration for IBM POWER8. IEEE Transactions on Parallel and Distributed Systems 31, 8 (2020), 1970–1982.

[58]

Michail Noltsis, Nikolaos Zambelis, Francky Catthoor, and Dimitrios Soudris. 2019. A closed-loop controller to ensure performance and temperature constraints for dynamic applications. ACM Transactions on Embedded Computing Systems 18, 5 (2019), 1–24.

Digital Library

[59]

NXP. 2018. NXP Semiconductors, Layerscape® LX2160A, LX2120A, LX2080A Processors. Retrieved October 31, 2023 from https://www.nxp.com/

[60]

Samuel Pakalapati and Biswabandan Panda. 2020. Bouquet of instruction pointers: Instruction pointer classifier-based spatial hardware prefetching. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE, Los Alamitos, CA, 118–131.

Digital Library

[61]

Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep learning inference in Facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv:1811.09886 (2018).

[62]

Suchita Pati, Shaizeen Aga, Nuwan Jayasena, and Matthew D. Sinclair. 2022. Demystifying BERT: System design implications. In Proceedings of the 2022 IEEE International Symposium on Workload Characterization (IISWC’22). IEEE, Los Alamitos, CA, 296–309.

[63]

Runjie Zhang, Mircea R. Stan, and Kevin Skadron. 2015. HotSpot 6.0: Validation, Acceleration and Extension. Technical Report CS-2015-04. University of Virginia.

[64]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Preprint.

[65]

Robert M. Radway, Andrew Bartolo, Paul C. Jolly, Zainab F. Khan, Binh Q. Le, Pulkit Tandon, Tony F. Wu, Yunfeng Xin, Elisa Vianello, Pascal Vivet, et al. 2021. Illusion of large on-chip memory by networked computing chips for neural network inference. Nature Electronics 4, 1 (2021), 71–80.

[66]

Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53–65.

Digital Library

[67]

Kaushik Roy, Indranil Chakraborty, Mustafa Ali, Aayush Ankit, and Amogh Agrawal. 2020. In-memory computing in emerging memory technologies for machine learning: An overview. In Proceedings of the 2020 57th Annual ACM/IEEE Design Automation Conference (DAC’20). IEEE, Los Alamitos, CA, 1–6.

[68]

Lokesh Siddhu, Rajesh Kedia, and Preeti Ranjan Panda. 2020. Leakage-aware dynamic thermal management of 3D memories. ACM Transactions on Design Automation of Electronic Systems 26, 2 (2020), 1–31.

Digital Library

[69]

Lokesh Siddhu, Rajesh Kedia, Shailja Pandey, Martin Rapp, Anuj Pathania, Jörg Henkel, and Preeti Ranjan Panda. 2022. CoMeT: An integrated interval thermal simulation toolchain for 2D, 2.5D, and 3D processor-memory systems. ACM Transactions on Architecture and Code Optimization 19, 3 (2022), Article 44, 25 pages.

Digital Library

[70]

Lokesh Siddhu and Preeti Ranjan Panda. 2019. PredictNcool: Leakage aware thermal management for 3D memories using a lightweight temperature predictor. ACM Transactions on Embedded Computing Systems 18, 5s (2019), 1–22.

Digital Library

[71]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[72]

Kyomin Sohn, Won-Joo Yun, Reum Oh, Chi-Sung Oh, Seong-Young Seo, Min-Sang Park, Dong-Hak Shin, Won-Chang Jung, Sang-Hoon Shin, Je-Min Ryu, et al. 2016. A 1.2 V 20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE Journal of Solid-State Circuits 52, 1 (2016), 250–260.

[73]

Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).

[74]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.

[75]

Marian Verhelst and Bert Moons. 2017. Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to IoT and edge devices. IEEE Solid-State Circuits Magazine 9, 4 (2017), 55–65. DOI:

[76]

Sandareka Wickramanayake, Wynne Hsu, and Mong Li Lee. 2021. Explanation-based data augmentation for image classification. Advances in Neural Information Processing Systems 34 (2021), 20929–20940.

[77]

Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, Los Alamitos, CA, 331–344.

[78]

Carole-Jean Wu and Margaret Martonosi. 2011. Characterization and dynamic mitigation of intra-application cache interference. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’11). IEEE, Los Alamitos, CA, 2–11.

Digital Library

[79]

Wm. A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20–24.

Digital Library

[80]

Shouyi Yin, Shibin Tang, Xinhan Lin, Peng Ouyang, Fengbin Tu, Jishen Zhao, Cong Xu, Shuangcheng Li, Yuan Xie, ShaoJun Wei, et al. 2018. Parana: A parallel neural architecture considering thermal problem of 3D stacked memory. IEEE Transactions on Parallel and Distributed Systems 30, 1 (2018), 146–160.

Digital Library

[81]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 161–170.

Digital Library

[82]

Suyu Zhang and Zhichun Zhu. 2014. Access-aware memory thermal management. In Proceedings of the 2014 9th IEEE International Conference on Networking, Architecture, and Storage. IEEE, Los Alamitos, CA, 268–274.

Digital Library

[83]

Minxuan Zhou, Mohsen Imani, Saransh Gupta, and Tajana Rosing. 2019. Thermal-aware design and management for search-based in-memory acceleration. In Proceedings of the 2009 56th Annual Design Automation Conference (DAC’19). 1–6.

Digital Library

Cited By

Pandey SPanda P(2024)NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN PerformanceACM Transactions on Embedded Computing Systems10.1145/367717823:6(1-30)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3677178
Kim HChoi SKong JGong YChung SDev KYoo JMeinerzhagen P(2024)Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN InferenceProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670825(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670825
Pandey SSethi SPanda P(2024)3D-TemPo: Optimizing 3-D DRAM Performance Under Temperature and Power ConstraintsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336723543:8(2263-2276)Online publication date: Aug-2024
https://doi.org/10.1109/TCAD.2024.3367235

Index Terms

NeuroCool: Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Hardware

Recommendations

NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN Performance
Deep neural networks (DNNs) have been widely adopted, owing to break-through performance and high accuracy. DNNs exhibit varying memory behavior involving specific and recognizable memory access patterns and access intensity, depending on the selected ...
Leakage-Aware Dynamic Thermal Management of 3D Memories

3D memory systems offer several advantages in terms of area, bandwidth, and energy efficiency. However, thermal issues arising out of higher power densities have limited their widespread use. While prior works have looked at reducing dynamic power ...
Improving VLIW Processor Performance Using Three-Dimensional (3D) DRAM Stacking
ASAP '09: Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

This work studies the potential of using emerging 3D integration to improve embedded VLIW computing system. We focus on the 3D integration of one VLIW processor die with multiple high-capacity DRAM dies. Our proposed memory architecture employs 3D ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 29, Issue 1

January 2024

521 pages

EISSN:1557-7309

DOI:10.1145/3613510

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 18 December 2023

Online AM: 23 October 2023

Accepted: 08 October 2023

Revised: 11 July 2023

Received: 03 September 2022

Published in TODAES Volume 29, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Semiconductor Research Corporation (SRC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)319
Downloads (Last 6 weeks)38

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pandey SPanda P(2024)NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN PerformanceACM Transactions on Embedded Computing Systems10.1145/367717823:6(1-30)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3677178
Kim HChoi SKong JGong YChung SDev KYoo JMeinerzhagen P(2024)Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN InferenceProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670825(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670825
Pandey SSethi SPanda P(2024)3D-TemPo: Optimizing 3-D DRAM Performance Under Temperature and Power ConstraintsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336723543:8(2263-2276)Online publication date: Aug-2024
https://doi.org/10.1109/TCAD.2024.3367235
Dalloo AJaleel Humaidi AAl Mhdawi AAl-Raweshidy H(2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3467375

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents