research-article

Dynamic Cache Pooling in 3D Multicore Processors

Authors:

Tiansheng Zhang,

Ayse K. CoskunAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 12, Issue 2

Article No.: 14, Pages 1 - 21

https://doi.org/10.1145/2700247

Published: 02 September 2015 Publication History

Abstract

Resource pooling, where multiple architectural components are shared among cores, is a promising technique for improving system energy efficiency and reducing total chip area. 3D stacked multicore processors enable efficient pooling of cache resources owing to the short interconnect latency between vertically stacked layers. This article first introduces a 3D multicore architecture that provides poolable cache resources. We then propose a runtime management policy to improve energy efficiency in 3D systems by utilizing the flexible heterogeneity of cache resources. Our policy dynamically allocates jobs to cores on the 3D system while partitioning cache resources based on cache hungriness of the jobs. We investigate the impact of the proposed cache resource pooling architecture and management policy in 3D systems, both with and without on-chip DRAM. We evaluate the performance, energy efficiency, and thermal behavior for a wide range of workloads running on 3D systems. Experimental results demonstrate that the proposed architecture and policy reduce system energy-delay product (EDP) and energy-delay-area product (EDAP) by 18.8% and 36.1% on average, respectively, in comparison to 3D processors with static cache sizes.

References

[1]

David H. Albonesi. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the International Symposium on Microarchitecture (MICRO'99). 248--259.

Digital Library

[2]

Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60.

Digital Library

[3]

Bryan Black, Murali Annavaram, Ned Brekelbaum, John Devale, and Lei Jiang, et al. 2006. Die stacking (3D) microarchitecture. In Proceedings of the International Symposium on Microarchitecture (MICRO'06). 469--479.

Digital Library

[4]

Paul Bogdan, Radu Marculescu, Siddharth Jain, and Rafael T. Gavila. 2012. An optimal control approach to power management for multi-voltage and frequency islands multiprocessor platforms under highly variable workloads. In Proceedings of the IEEE/ACM International Symposium on Networks on Chip (NoCS'12). 35--42.

Digital Library

[5]

Derek Chiou, Srinivas Devadas, Larry Rudolph, and Boon S. Ang. 2000. Dynamic cache partitioning via columnization. Tech rep., Massachusetts Institute of Technology. http://csg.csail.mit.edu/pubs/memos/Memo-430/memo-430.pdf.

[6]

Theofanis Constantinou, Yiannakis Sazeides, Pierre Michaud, Damien Fetis, and Andre Seznec. 2005. Performance implications of single thread migration on a chip multi-core. SIGARCH Comput. Archit. News 33, 4, 80--91.

Digital Library

[7]

Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes. 2009. Blade computing with the AMD Opteron processor (Magny-Cours). http://www.hotchips.org/wp-content/uploads/hc_archives/hc21/2_mon/HC21.24.100.ServerSystemsI-Epub/HC21.24.110Conway-AMD-Magny-Cours.pdf.

[8]

Ayse K. Coskun, Jose L. Ayala, David Atienza, Tajana S. Rosing, and Yusuf Leblebici. 2009a. Dynamic thermal management in 3D multicore architectures. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'09). 1410--1415.

Digital Library

[9]

Ayse K. Coskun, Richard Strong, Dean M. Tullsen, and Tajana S. Rosing. 2009b. Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors. In Proceedings of the SIGMET-RICS/Performance -- Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'09). 169--180.

Digital Library

[10]

Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, and Mani Azimi. 2012. Application-to-core mapping policies to reduce memory interference in multi-core systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'12). 455--456.

Digital Library

[11]

Mohamed Gomaa, Michael D. Powell, and T. N. Vijaykumar. 2004. Heat-and-run: Leveraging SMT and CMP to manage power density through the operating system. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'04). 260--270.

Digital Library

[12]

Fazal Hameed, Mohammad A. A. Faruque, and Jorg Henkel. 2011. Dynamic thermal management in 3D multi-core architecture through run-time adaptation. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'11). 1--6.

Digital Library

[13]

Houman Homayoun, Vasileios Kontorinis, Amirali Shayan, Ta-Wei Lin, and Dean M. Tullsen. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA'12). 1--12.

Digital Library

[14]

John Howard, Saurabh Dighe, Sriram Vangal, G. Ruhl, Shekhar Borkar, et al. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference (ISSCC'10). 108--109.

[15]

Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. 2007. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA'07). 186--197.

Digital Library

[16]

Jongpil Jung, Kyungsu Kang, and Chong-Min Kyung. 2011. Design and management of 3D-stacked NUCA cache for chip multiprocessors. In Proceedings of the ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI'11). 91--96.

Digital Library

[17]

Md Kamruzzaman, Steven Swanson, and Dean M. Tullsen. 2011. Inter-core prefetching for multicore processors using migrating helper threads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11). 393--404.

Digital Library

[18]

Rakesh Kumar, Victor Zyuban, and Dean M. Tullsen. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the International Symposium on Computer Architecture (ISCA'05). 408--419.

Digital Library

[19]

Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the International Symposium on Microarchitecture (MICRO'09). 469--480.

Digital Library

[20]

Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the International Symposium on Computer Architecture (ISCA'08). 453--464.

Digital Library

[21]

Gabriel H. Loh. 2009. Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy. In Proceedings of the International Symposium on Microarchitecture (MICRO'09). 201--212.

Digital Library

[22]

Jose F. Martinez and Engin Ipek. 2009. Dynamic multicore resource management: A machine learning approach. IEEE Micro 29, 5, 8--17.

Digital Library

[23]

Jie Meng, Katsutoshi Kawakami, and Ayse K. Coskun. 2012. Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints. In Proceedings of the Design Automation Conference (DAC'12). 648--655.

Digital Library

[24]

Jie Meng, Tiansheng Zhang, and Ayse K. Coskun. 2013. Dynamic cache pooling for improving energy efficiency in 3D stacked multicore processors. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC'13). 210--215.

[25]

Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2006. Dynamic resizing of superscalar datapath components for energy efficiency. IEEE Trans. Comput. 55, 2, 199--213.

Digital Library

[26]

Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture (MICRO'06). 423--432.

Digital Library

[27]

Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. 2003. Digital Integrated Circuits: A Design Perspective, 2^nd ed. Prentice Hall.

[28]

Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and David Tarjan. 2003. Temperature-aware microarchitecture. In Proceedings of the International Symposium on Computer Architecture (ISCA'03). 2--13.

Digital Library

[29]

Allan Snavely and Dean M. Tullsen. 2000. Symbiotic job scheduling for a simultaneous multithreaded processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'00). 234--244.

Digital Library

[30]

Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA'09). 239--249.

[31]

Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. http://www.hpl.hp.com/techreports/2008/HPL-2008-20.pdf?jumpid=reg_R1002_USEN.

[32]

Keshavan Varadarajan, S. K. Nandy, Vishal Sharda, Amrutur Bharadwaj, Ravi Iyer, Srihari Makineni, and Donald Newell. 2006. Molecular caches: A caching structure for dynamic creation of application specific heterogeneous cache regions. In Proceedings of the International Symposium on Microarchitecture (MICRO'06). 433--442.

Digital Library

[33]

Xin Zhao, Jacob Minz, and Sung-Kyu Lim. 2011. Low-power and reliable clock network design for through-silicon via (TSV) based 3D ICs. IEEE Trans. Components Packag. Manufact. Technol. 1, 2, 247--259.

[34]

Changyun Zhu, Zhenyu Gu, Li Shang, Robert P. Dick, and Russ Joseph. 2008. Three-dimensional chip-multiprocessor run-time thermal management. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27, 8.

Digital Library

[35]

Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). 129--142.

Digital Library

Cited By

Kothari GGhose K(2023)Thermally-Aware Multi-Core Chiplet Stacking2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323991(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323991
Bagherzadeh JAmarnath ATan JPal SDreslinski R(2021)A Holistic Solution for Reliability of 3D Parallel SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/348890018:1(1-27)Online publication date: 16-Nov-2021
https://dl.acm.org/doi/10.1145/3488900
Cao KZhou JWei TChen MHu SLi K(2019)A survey of optimization techniques for thermal-aware 3D processorsJournal of Systems Architecture10.1016/j.sysarc.2019.01.003Online publication date: Jan-2019
https://doi.org/10.1016/j.sysarc.2019.01.003
Show More Cited By

Index Terms

Dynamic Cache Pooling in 3D Multicore Processors

Recommendations

Decoupled Fused Cache: Fusing a Decoupled LLC with a DRAM Cache

DRAM caches have shown excellent potential in capturing the spatial and temporal data locality of applications capitalizing on advances of 3D-stacking technology; however, they are still far from their ideal performance. Besides the unavoidable DRAM ...
Energy-aware Filter Cache Architecture for Multicore Processors
DELTA '10: Proceedings of the 2010 Fifth IEEE International Symposium on Electronic Design, Test & Applications

Energy consumption as well as performance should be considered when designing high-performance multicore processors. The energy consumed in the instruction cache accounts for a significant portion of total processor energy consumption. Therefore, energy-...
Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip's area and thus account for a significant fraction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 12, Issue 2

Special Issue on Advances in Design of Ultra-Low Power Circuits and Systems in Emerging Technologies

August 2015

191 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/2820112

Editor:
Krishnendu Chakrabarty
Duke University, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 02 September 2015

Accepted: 01 September 2014

Revised: 01 June 2014

Received: 01 December 2013

Published in JETC Volume 12, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
184
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kothari GGhose K(2023)Thermally-Aware Multi-Core Chiplet Stacking2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323991(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323991
Bagherzadeh JAmarnath ATan JPal SDreslinski R(2021)A Holistic Solution for Reliability of 3D Parallel SystemsACM Journal on Emerging Technologies in Computing Systems10.1145/348890018:1(1-27)Online publication date: 16-Nov-2021
https://dl.acm.org/doi/10.1145/3488900
Cao KZhou JWei TChen MHu SLi K(2019)A survey of optimization techniques for thermal-aware 3D processorsJournal of Systems Architecture10.1016/j.sysarc.2019.01.003Online publication date: Jan-2019
https://doi.org/10.1016/j.sysarc.2019.01.003
Liu YDong HZhang LSaddik A(2018)Technical Evaluation of HoloLens for MultimediaIEEE MultiMedia10.1109/MMUL.2018.287347325:4(8-18)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1109/MMUL.2018.2873473
Asad AOzturk OFathy MJahed-Motlagh M(2017)Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchyMicroprocessors and Microsystems10.1016/j.micpro.2017.03.01151(76-98)Online publication date: Jun-2017
https://doi.org/10.1016/j.micpro.2017.03.011
Zhang LHan BDong HEl Saddik A(2017)Development of an automatic 3D human head scanning-printing systemMultimedia Tools and Applications10.1007/s11042-016-3949-276:3(4381-4403)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s11042-016-3949-2
Caramanis MNtakou EHogan WChakrabortty ASchoene J(2016)Co-Optimization of Power and Reserves in Dynamic T&D Power Markets With Nondispatchable Renewable Generation and Distributed Energy ResourcesProceedings of the IEEE10.1109/JPROC.2016.2520758104:4(807-836)Online publication date: Apr-2016
https://doi.org/10.1109/JPROC.2016.2520758

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents