skip to main content
research-article

NeuroCool: Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching

Published: 18 December 2023 Publication History

Abstract

Deep neural network (DNN) implementations are typically characterized by huge datasets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores/edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under-utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling.
We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed NeuroCool DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test datasets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.

References

[1]
Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2015. Low-power hybrid memory cubes with link power management and two-level prefetching. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 2 (2015), 453–464.
[2]
Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. 2022. Restoring and attributing ancient texts using deep neural networks. Nature 603, 7900 (2022), 280–283.
[3]
Raid Ayoub, Rajib Nath, and Tajana Simunic Rosing. 2013. CoMETC: Coordinated management of energy/thermal/ cooling in servers. ACM Transactions on Design Automation of Electronic Systems 19, 1 (2013), 1–28.
[4]
Min Bao, Alexandru Andrei, Petru Eles, and Zebo Peng. 2009. On-line thermal aware dynamic voltage scaling for energy optimization with frequency/temperature dependency consideration. In Proceedings of the 2009 46th Annual ACM/IEEE Design Automation Conference (DAC’09). 490–495.
[5]
Majed Valad Beigi and Gokhan Memik. 2018. Thermal-aware optimizations of ReRAM-based neuromorphic computing systems. In Proceedings of the 2018 55th Annual Design Automation Conference(DAC’18). 1–6.
[6]
Majed Valad Beigi and Gokhan Memik. 2018. THOR: Thermal-aware optimizations for extending ReRAM lifetime. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). IEEE, Los Alamitos, CA, 670–679.
[7]
Rahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. 2021. Pythia: A customizable hardware prefetching framework using online reinforcement learning. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’21). 1121–1137.
[8]
Paul Bogdan, Partha Pratim Pande, Hussam Amrouch, Muhammad Shafique, and Jörg Henkel. 2016. Power and thermal management in massive multicore chips: Theoretical foundation meets architectural innovation and resource allocation. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. 1–2.
[9]
Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization 11, 3 (Aug. 2014), Article 28, 25 pages. DOI:
[10]
Christine S. Chan, Alper Sinan Akyürek, Baris Aksanli, and Tajana Šimunić Rosing. 2018. Optimal performance-aware cooling on enterprise servers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 9 (2018), 1689–1702.
[11]
Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In Proceedings of the 2012 Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). 33–38. DOI:
[12]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2016), 127–138.
[13]
Zeshan Chishti and Berkin Akin. 2019. Memory system characterization of deep learning workloads. In Proceedings of the International Symposium on Memory Systems. 497–505.
[14]
Ryan Cochran and Sherief Reda. 2013. Thermal prediction and adaptive control through workload phase detection. ACM Transactions on Design Automation of Electronic Systems 18, 1 (2013), 1–19.
[15]
Intel Corporation. 2020. Intel® Architecture Instruction Set Extensions and Future Features: Programming Reference. Intel Corporation.
[16]
Jeff Dean. 2017. Machine learning for systems and systems for machine learning. In Proceedings of the 2017 Conference on Neural Information Processing Systems.
[17]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[18]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625–2634.
[19]
Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In Proceedings of the 2018 ACM/IEEE 45Th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, Los Alamitos, CA, 383–396.
[20]
Amin Farmahini-Farahani, Sudhanva Gurumurthi, Gabriel Loh, and Michael Ignatowski. 2018. Challenges of high-capacity DRAM stacks and potential directions. In Proceedings of the Workshop on Memory Centric High Performance Computing. 4–13.
[21]
John W. C. Fu, Janak H. Patel, and Bob L. Janssens. 1992. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter 23, 1-2 (1992), 102–110.
[22]
Vinod Ganesan, Sanchari Sen, Pratyush Kumar, Neel Gala, Kamakoti Veezhinathan, and Anand Raghunathan. 2020. Sparsity-aware caches to accelerate deep neural networks. In Proceedings of the 2020 Design, Automation, and Test in Europe Conference and Exhibition (DATE’20). IEEE, Los Alamitos, CA, 85–90.
[23]
Pedram Ghazi, Antti P. Happonen, Jani Boutellier, and Heikki Huttunen. 2018. Embedded implementation of a deep learning smile detector. In Proceedings of the 2018 7th European Workshop on Visual Information Processing (EUVIP’18). IEEE, Los Alamitos, CA, 1–6.
[24]
Takeshi Hamamoto, Soichi Sugiura, and Shizuo Sawada. 1998. On the retention time distribution of dynamic random access memory (DRAM). IEEE Transactions on Electron Devices 45, 6 (1998), 1300–1309.
[25]
Fazal Hameed, Mohammad Abdullah Al Faruque, and Jörg Henkel. 2011. Dynamic thermal management in 3D multi-core architecture through run-time adaptation. In Proceedings of the 2011 Design, Automation, and Test in Europe Conference and Exhibition (DATE’11). 1–6.
[26]
Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, Los Alamitos, CA, 620–629.
[27]
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 558–567.
[28]
Jörg Henkel, Heba Khdr, and Martin Rapp. 2019. Smart thermal management for heterogeneous multicores. In Proceedings of the 2019 Design, Automation, and Test in Europe Conference and Exhibition (DATE’19). 132–137.
[29]
Ahmet Inci, Mehmet Meric Isgenc, and Diana Marculescu. 2021. DeepNVM++: Cross-layer modeling and optimization framework of nonvolatile memories for deep learning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 10 (2021), 3426–3437.
[30]
Intel. 2022. Intel Max Series Brings Breakthrough Memory Bandwidth and Performance to HPC and AI. Retrieved October 31, 2023 from https://www.intel.com/content/www/us/en/newsroom/news/introducing-intel-max-series-product-family.html
[31]
Arman Iranfar, Federico Terraneo, Gabor Csordas, Marina Zapater, William Fornaciari, and David Atienza. 2020. Dynamic thermal management with proactive fan speed control through reinforcement learning. In Proceedings of the 2020 Design, Automation, and Test in Europe Conference and Exhibition (DATE’20). 418–423.
[32]
Joe Jeddeloh and Brent Keeth. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Proceedings of the 2012 Symposium on VLSI Technology (VLSIT’12). IEEE, Los Alamitos, CA, 87–88.
[33]
JEDEC. 2022. JEDEC Standard High Bandwidth Memory DRAM (HBM3), JESD238. Retrieved October 31, 2023 from https://www.jedec.org/standards-documents/docs/jesd238
[34]
Víctor Jiménez, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Pradip Bose, Francis P. O’Connell, and Bruce G. Mealey. 2014. Adaptive prefetching on POWER7: Improving performance and power consumption. ACM Transactions on Parallel Computing 1, 1 (2014), 1–25.
[35]
Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. HBM (high bandwidth memory) DRAM technology and architecture. In Proceedings of the 2017 IEEE International Memory Workshop (IMW’17). IEEE, Los Alamitos, CA, 1–4.
[36]
Soheil Khadirsharbiyani, Jagadish Kotra, Karthik Rao, and Mahmut Kandemir. 2022. Data convection: A GPU-driven case study for thermal-aware data placement in 3D DRAMs. Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, 1 (2022), 1–25.
[37]
Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, et al. 2023. Full stack optimization of transformer inference: A survey. arXiv preprint arXiv:2302.14017 (2023).
[38]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.
[39]
Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs. arXiv preprint arXiv:1801.06601 (2018).
[40]
Dongjin Lee, Sourav Das, Janardhan Rao Doppa, Partha Pratim Pande, and Krishnendu Chakrabarty. 2018. Performance and thermal tradeoffs for energy-efficient monolithic 3D network-on-chip. ACM Transactions on Design Automation of Electronic Systems 23, 5 (2018), 1–25.
[41]
Jaekyu Lee, Hyesoon Kim, and Richard Vuduc. 2012. When prefetching works, when it doesn’t, and why. ACM Transactions on Architecture and Code Optimization 9, 1 (2012), 1–29.
[42]
Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, and Huiyang Zhou. 2016. Optimizing memory efficiency for deep convolutional neural networks on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’16). IEEE, Los Alamitos, CA, 633–644.
[43]
Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu, and Xiaowei Li. 2018. SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators. In Proceedings of the 2018 Design, Automation, and Test in Europe Conference and Exhibition (DATE’18). 343–348.
[44]
Chien-Hui Liao, Charles H.-P. Wen, and Krishnendu Chakrabarty. 2015. An online thermal-constrained task scheduler for 3D multi-core processors. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). 351–356.
[45]
Shih-Wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine learning-based prefetch optimization for data center applications. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis. 1–10.
[46]
Christianto C. Liu, Ilya Ganusov, Martin Burtscher, and Sandip Tiwari. 2005. Bridging the processor-memory performance gap with 3D IC technology. IEEE Design & Test of Computers 22, 6 (2005), 556–564.
[47]
Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu. 2013. An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms. ACM SIGARCH Computer Architecture News 41, 3 (2013), 60–71.
[48]
Jamie Liu, Ben Jaiyen, Richard Veras, and Onur Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. ACM SIGARCH Computer Architecture News 40, 3 (2012), 1–12.
[49]
Xiao Liu, Mingxuan Zhou, Tajana S. Rosing, and Jishen Zhao. 2019. HR 3 AM: A heat resilient design for RRAM-based neuromorphic computing. In Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’19). IEEE, Los Alamitos, CA, 1–6.
[50]
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN model inference on CPUs. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19). 1025–1040.
[51]
Wei-Hen Lo, Kai-Zen Liang, and TingTing Hwang. 2016. Thermal-aware dynamic page allocation policy by future access patterns for Hybrid Memory Cube (HMC). In Proceedings of the 2016 Design, Automation, and Test in Europe Conference and Exhibition (DATE’16). 1084–1089.
[52]
Yanchao Lu, Donghong Wu, Bingsheng He, Xueyan Tang, Jianliang Xu, and Minyi Guo. 2015. Rank-aware dynamic migrations and adaptive demotions for DRAM power management. IEEE Transactions on Computers 65, 1 (2015), 187–202.
[53]
Sally A. McKee. 2004. Reflections on the memory wall. In Proceedings of the 1st Conference on Computing Frontiers. 162.
[54]
Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, and Luca Benini. 2018. NEURAghe: Exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on Zynq SoCs. ACM Transactions on Reconfigurable Technology and Systems 11, 3 (2018), 1–24.
[55]
Paolo Meloni, Daniela Loi, Paola Busia, Gianfranco Deriu, Andy D. Pimentel, Dolly Sapra, Todor Stefanov, Svetlana Minakova, Francesco Conti, Luca Benini, et al. 2019. Optimization and deployment of CNNs at the edge: The ALOHA experience. In Proceedings of the 16th ACM International Conference on Computing Frontiers (CF’19). ACM, New York, NY, 326–332. DOI:
[56]
Jie Meng, Katsutoshi Kawakami, and Ayse K. Coskun. 2012. Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints. In Proceedings of the 2012 49th Annual Design Automation Conference (DAC’12). 648–655.
[57]
Carlos Navarro, Josué Feliu, Salvador Petit, Maria E. Gomez, and Julio Sahuquillo. 2020. Bandwidth-aware dynamic prefetch configuration for IBM POWER8. IEEE Transactions on Parallel and Distributed Systems 31, 8 (2020), 1970–1982.
[58]
Michail Noltsis, Nikolaos Zambelis, Francky Catthoor, and Dimitrios Soudris. 2019. A closed-loop controller to ensure performance and temperature constraints for dynamic applications. ACM Transactions on Embedded Computing Systems 18, 5 (2019), 1–24.
[59]
NXP. 2018. NXP Semiconductors, Layerscape® LX2160A, LX2120A, LX2080A Processors. Retrieved October 31, 2023 from https://www.nxp.com/
[60]
Samuel Pakalapati and Biswabandan Panda. 2020. Bouquet of instruction pointers: Instruction pointer classifier-based spatial hardware prefetching. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). IEEE, Los Alamitos, CA, 118–131.
[61]
Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep learning inference in Facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv:1811.09886 (2018).
[62]
Suchita Pati, Shaizeen Aga, Nuwan Jayasena, and Matthew D. Sinclair. 2022. Demystifying BERT: System design implications. In Proceedings of the 2022 IEEE International Symposium on Workload Characterization (IISWC’22). IEEE, Los Alamitos, CA, 296–309.
[63]
Runjie Zhang, Mircea R. Stan, and Kevin Skadron. 2015. HotSpot 6.0: Validation, Acceleration and Extension. Technical Report CS-2015-04. University of Virginia.
[64]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Preprint.
[65]
Robert M. Radway, Andrew Bartolo, Paul C. Jolly, Zainab F. Khan, Binh Q. Le, Pulkit Tandon, Tony F. Wu, Yunfeng Xin, Elisa Vianello, Pascal Vivet, et al. 2021. Illusion of large on-chip memory by networked computing chips for neural network inference. Nature Electronics 4, 1 (2021), 71–80.
[66]
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53–65.
[67]
Kaushik Roy, Indranil Chakraborty, Mustafa Ali, Aayush Ankit, and Amogh Agrawal. 2020. In-memory computing in emerging memory technologies for machine learning: An overview. In Proceedings of the 2020 57th Annual ACM/IEEE Design Automation Conference (DAC’20). IEEE, Los Alamitos, CA, 1–6.
[68]
Lokesh Siddhu, Rajesh Kedia, and Preeti Ranjan Panda. 2020. Leakage-aware dynamic thermal management of 3D memories. ACM Transactions on Design Automation of Electronic Systems 26, 2 (2020), 1–31.
[69]
Lokesh Siddhu, Rajesh Kedia, Shailja Pandey, Martin Rapp, Anuj Pathania, Jörg Henkel, and Preeti Ranjan Panda. 2022. CoMeT: An integrated interval thermal simulation toolchain for 2D, 2.5D, and 3D processor-memory systems. ACM Transactions on Architecture and Code Optimization 19, 3 (2022), Article 44, 25 pages.
[70]
Lokesh Siddhu and Preeti Ranjan Panda. 2019. PredictNcool: Leakage aware thermal management for 3D memories using a lightweight temperature predictor. ACM Transactions on Embedded Computing Systems 18, 5s (2019), 1–22.
[71]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[72]
Kyomin Sohn, Won-Joo Yun, Reum Oh, Chi-Sung Oh, Seong-Young Seo, Min-Sang Park, Dong-Hak Shin, Won-Chang Jung, Sang-Hoon Shin, Je-Min Ryu, et al. 2016. A 1.2 V 20 nm 307 GB/s HBM DRAM with at-speed wafer-level IO test scheme and adaptive refresh considering temperature distribution. IEEE Journal of Solid-State Circuits 52, 1 (2016), 250–260.
[73]
Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).
[74]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.
[75]
Marian Verhelst and Bert Moons. 2017. Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to IoT and edge devices. IEEE Solid-State Circuits Magazine 9, 4 (2017), 55–65. DOI:
[76]
Sandareka Wickramanayake, Wynne Hsu, and Mong Li Lee. 2021. Explanation-based data augmentation for image classification. Advances in Neural Information Processing Systems 34 (2021), 20929–20940.
[77]
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, Los Alamitos, CA, 331–344.
[78]
Carole-Jean Wu and Margaret Martonosi. 2011. Characterization and dynamic mitigation of intra-application cache interference. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’11). IEEE, Los Alamitos, CA, 2–11.
[79]
Wm. A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23, 1 (1995), 20–24.
[80]
Shouyi Yin, Shibin Tang, Xinhan Lin, Peng Ouyang, Fengbin Tu, Jishen Zhao, Cong Xu, Shuangcheng Li, Yuan Xie, ShaoJun Wei, et al. 2018. Parana: A parallel neural architecture considering thermal problem of 3D stacked memory. IEEE Transactions on Parallel and Distributed Systems 30, 1 (2018), 146–160.
[81]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 161–170.
[82]
Suyu Zhang and Zhichun Zhu. 2014. Access-aware memory thermal management. In Proceedings of the 2014 9th IEEE International Conference on Networking, Architecture, and Storage. IEEE, Los Alamitos, CA, 268–274.
[83]
Minxuan Zhou, Mohsen Imani, Saransh Gupta, and Tajana Rosing. 2019. Thermal-aware design and management for search-based in-memory acceleration. In Proceedings of the 2009 56th Annual Design Automation Conference (DAC’19). 1–6.

Cited By

View all
  • (2024)NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN PerformanceACM Transactions on Embedded Computing Systems10.1145/367717823:6(1-30)Online publication date: 11-Sep-2024
  • (2024)Sparrow ECC: A Lightweight ECC Approach for HBM Refresh Reduction towards Energy-efficient DNN InferenceProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670825(1-6)Online publication date: 5-Aug-2024
  • (2024)3D-TemPo: Optimizing 3-D DRAM Performance Under Temperature and Power ConstraintsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336723543:8(2263-2276)Online publication date: Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 29, Issue 1
January 2024
521 pages
EISSN:1557-7309
DOI:10.1145/3613510
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 18 December 2023
Online AM: 23 October 2023
Accepted: 08 October 2023
Revised: 11 July 2023
Received: 03 September 2022
Published in TODAES Volume 29, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D DRAM
  2. dynamic thermal management
  3. customized prefetching

Qualifiers

  • Research-article

Funding Sources

  • Semiconductor Research Corporation (SRC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)319
  • Downloads (Last 6 weeks)38
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media