skip to main content
research-article

Dynamic Cache Pooling in 3D Multicore Processors

Published: 02 September 2015 Publication History

Abstract

Resource pooling, where multiple architectural components are shared among cores, is a promising technique for improving system energy efficiency and reducing total chip area. 3D stacked multicore processors enable efficient pooling of cache resources owing to the short interconnect latency between vertically stacked layers. This article first introduces a 3D multicore architecture that provides poolable cache resources. We then propose a runtime management policy to improve energy efficiency in 3D systems by utilizing the flexible heterogeneity of cache resources. Our policy dynamically allocates jobs to cores on the 3D system while partitioning cache resources based on cache hungriness of the jobs. We investigate the impact of the proposed cache resource pooling architecture and management policy in 3D systems, both with and without on-chip DRAM. We evaluate the performance, energy efficiency, and thermal behavior for a wide range of workloads running on 3D systems. Experimental results demonstrate that the proposed architecture and policy reduce system energy-delay product (EDP) and energy-delay-area product (EDAP) by 18.8% and 36.1% on average, respectively, in comparison to 3D processors with static cache sizes.

References

[1]
David H. Albonesi. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the International Symposium on Microarchitecture (MICRO'99). 248--259.
[2]
Nathan L. Binkert, Ronald G. Dreslinski, Lisa R. Hsu, Kevin T. Lim, Ali G. Saidi, and Steven K. Reinhardt. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60.
[3]
Bryan Black, Murali Annavaram, Ned Brekelbaum, John Devale, and Lei Jiang, et al. 2006. Die stacking (3D) microarchitecture. In Proceedings of the International Symposium on Microarchitecture (MICRO'06). 469--479.
[4]
Paul Bogdan, Radu Marculescu, Siddharth Jain, and Rafael T. Gavila. 2012. An optimal control approach to power management for multi-voltage and frequency islands multiprocessor platforms under highly variable workloads. In Proceedings of the IEEE/ACM International Symposium on Networks on Chip (NoCS'12). 35--42.
[5]
Derek Chiou, Srinivas Devadas, Larry Rudolph, and Boon S. Ang. 2000. Dynamic cache partitioning via columnization. Tech rep., Massachusetts Institute of Technology. http://csg.csail.mit.edu/pubs/memos/Memo-430/memo-430.pdf.
[6]
Theofanis Constantinou, Yiannakis Sazeides, Pierre Michaud, Damien Fetis, and Andre Seznec. 2005. Performance implications of single thread migration on a chip multi-core. SIGARCH Comput. Archit. News 33, 4, 80--91.
[7]
Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes. 2009. Blade computing with the AMD Opteron processor (Magny-Cours). http://www.hotchips.org/wp-content/uploads/hc_archives/hc21/2_mon/HC21.24.100.ServerSystemsI-Epub/HC21.24.110Conway-AMD-Magny-Cours.pdf.
[8]
Ayse K. Coskun, Jose L. Ayala, David Atienza, Tajana S. Rosing, and Yusuf Leblebici. 2009a. Dynamic thermal management in 3D multicore architectures. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'09). 1410--1415.
[9]
Ayse K. Coskun, Richard Strong, Dean M. Tullsen, and Tajana S. Rosing. 2009b. Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors. In Proceedings of the SIGMET-RICS/Performance -- Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'09). 169--180.
[10]
Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, and Mani Azimi. 2012. Application-to-core mapping policies to reduce memory interference in multi-core systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'12). 455--456.
[11]
Mohamed Gomaa, Michael D. Powell, and T. N. Vijaykumar. 2004. Heat-and-run: Leveraging SMT and CMP to manage power density through the operating system. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'04). 260--270.
[12]
Fazal Hameed, Mohammad A. A. Faruque, and Jorg Henkel. 2011. Dynamic thermal management in 3D multi-core architecture through run-time adaptation. In Proceedings of the Design, Automation and Test in Europe Conference (DATE'11). 1--6.
[13]
Houman Homayoun, Vasileios Kontorinis, Amirali Shayan, Ta-Wei Lin, and Dean M. Tullsen. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA'12). 1--12.
[14]
John Howard, Saurabh Dighe, Sriram Vangal, G. Ruhl, Shekhar Borkar, et al. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference (ISSCC'10). 108--109.
[15]
Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. 2007. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA'07). 186--197.
[16]
Jongpil Jung, Kyungsu Kang, and Chong-Min Kyung. 2011. Design and management of 3D-stacked NUCA cache for chip multiprocessors. In Proceedings of the ACM/IEEE Great Lakes Symposium on VLSI (GLSVLSI'11). 91--96.
[17]
Md Kamruzzaman, Steven Swanson, and Dean M. Tullsen. 2011. Inter-core prefetching for multicore processors using migrating helper threads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11). 393--404.
[18]
Rakesh Kumar, Victor Zyuban, and Dean M. Tullsen. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the International Symposium on Computer Architecture (ISCA'05). 408--419.
[19]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the International Symposium on Microarchitecture (MICRO'09). 469--480.
[20]
Gabriel H. Loh. 2008. 3D-stacked memory architectures for multi-core processors. In Proceedings of the International Symposium on Computer Architecture (ISCA'08). 453--464.
[21]
Gabriel H. Loh. 2009. Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy. In Proceedings of the International Symposium on Microarchitecture (MICRO'09). 201--212.
[22]
Jose F. Martinez and Engin Ipek. 2009. Dynamic multicore resource management: A machine learning approach. IEEE Micro 29, 5, 8--17.
[23]
Jie Meng, Katsutoshi Kawakami, and Ayse K. Coskun. 2012. Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints. In Proceedings of the Design Automation Conference (DAC'12). 648--655.
[24]
Jie Meng, Tiansheng Zhang, and Ayse K. Coskun. 2013. Dynamic cache pooling for improving energy efficiency in 3D stacked multicore processors. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC'13). 210--215.
[25]
Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2006. Dynamic resizing of superscalar datapath components for energy efficiency. IEEE Trans. Comput. 55, 2, 199--213.
[26]
Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture (MICRO'06). 423--432.
[27]
Jan M. Rabaey, Anantha Chandrakasan, and Borivoje Nikolic. 2003. Digital Integrated Circuits: A Design Perspective, 2nd ed. Prentice Hall.
[28]
Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and David Tarjan. 2003. Temperature-aware microarchitecture. In Proceedings of the International Symposium on Computer Architecture (ISCA'03). 2--13.
[29]
Allan Snavely and Dean M. Tullsen. 2000. Symbiotic job scheduling for a simultaneous multithreaded processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'00). 234--244.
[30]
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA'09). 239--249.
[31]
Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. http://www.hpl.hp.com/techreports/2008/HPL-2008-20.pdf?jumpid=reg_R1002_USEN.
[32]
Keshavan Varadarajan, S. K. Nandy, Vishal Sharda, Amrutur Bharadwaj, Ravi Iyer, Srihari Makineni, and Donald Newell. 2006. Molecular caches: A caching structure for dynamic creation of application specific heterogeneous cache regions. In Proceedings of the International Symposium on Microarchitecture (MICRO'06). 433--442.
[33]
Xin Zhao, Jacob Minz, and Sung-Kyu Lim. 2011. Low-power and reliable clock network design for through-silicon via (TSV) based 3D ICs. IEEE Trans. Components Packag. Manufact. Technol. 1, 2, 247--259.
[34]
Changyun Zhu, Zhenyu Gu, Li Shang, Robert P. Dick, and Russ Joseph. 2008. Three-dimensional chip-multiprocessor run-time thermal management. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27, 8.
[35]
Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). 129--142.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 12, Issue 2
Special Issue on Advances in Design of Ultra-Low Power Circuits and Systems in Emerging Technologies
August 2015
191 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/2820112
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 02 September 2015
Accepted: 01 September 2014
Revised: 01 June 2014
Received: 01 December 2013
Published in JETC Volume 12, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D stacking
  2. Policy
  3. cache resource pooling
  4. energy efficiency
  5. runtime policy

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media