skip to main content
10.1145/2742060.2742078acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
short-paper

Reinforcement Learning for Thermal-aware Many-core Task Allocation

Published: 20 May 2015 Publication History

Abstract

To maintain reliable operation, task allocation for many-core processors must consider the heat interaction of processor cores and network-on-chip routers in performing task assignment. Our approach employs reinforcement learning, machine learning algorithm that performs task allocation based on current core and router temperatures and a prediction of which assignment will minimize maximum temperature in the future. The algorithm updates prediction models after each allocation based on feedback regarding the accuracy of previous predictions. Our new algorithm is verified via detailed many-core simulation which includes on-chip routing. Our results show that the proposed technique is fast (scheduling performed in <1 ms) and can efficiently reduce peak temperature by up to 8°C in a 49-core processor (4.3°C on average) versus a competing task allocation approach for a series of SPLASH-2 benchmarks.

References

[1]
A. K. Coskun, T. S. Rosing, and K. Whisnant, "Temperature aware task scheduling in MPSoCs," in Proc. DATE, Mar. 2007, pp. 1659--1664.
[2]
C. H. Yu, C.-L. Lung, Y.-L. Ho, R.-S. Hsu, D.-M. Kwai, and S.-C. Chang, "Thermal-aware on-line scheduler for 3-D many-core processor throughput optimization," IEEE TCAD, vol. 33, no. 5, May 2014.
[3]
A. Coskun, T. Rosing, K. Whisnant, and K. Gross, "Static and dynamic temperature-aware scheduling for multiprocessor SoCs," IEEE TVLSI, vol. 16, no. 9, pp. 1127--1140, Sep. 2008.
[4]
I. Yeo and E. J. Kim, "Temperature-aware scheduler based on thermal behavior grouping in multicore systems," in Proc. DATE, May 2009, pp. 946--951.
[5]
V. Hanumaiah, S. Vrudhula, and K. Chatha, "Performance optimal online DVFS and task migration techniques for thermally constrained multi-core processors," IEEE TCAD, vol. 30, no. 11, pp. 1677--1690, Nov. 2011.
[6]
W. Hung et al., "Thermal-aware IP virtualization and placement for networks-on-chip architecture," in Proc ICCD, Oct. 2004, pp. 430--437.
[7]
Z. Qian and C.-Y. Tsui, "A thermal-aware application specific routing algorithm for network-on-chip design," in Proc. ASP-DAC, 2011.
[8]
Y. Ge and Q. Qiu, "Dynamic thermal management for multimedia applications using machine learning," in Proc. DAC, June 2011, pp. 95--100.
[9]
T. Ebi, D. Kramer, W. Karl, and J. Henkel, "Economic learning for thermal-aware power budgeting in many-core architectures," in Proc. CODES+ISSS, Oct 2011.
[10]
G.-Y. Pan, J.-Y. Jou, and B.-C. Lai, "Scalable power management using multilevel reinforcement learning for multiprocessors," ACM Trans. Des. Autom. Electron. Syst., vol. 19, no. 4, pp. 33:1--33:23, Aug. 2014.
[11]
H. Chen and etc., "Spatially-aware optimization of energy consumption in consolidated data center systems," in ASME 2011 Pac. Rim Tech. Conf. and Exhibition on Packaging and Integration of Elec. and Phot. Sys.
[12]
A. G. Barto, Reinforcement learning: An introduction. MIT press, 1998.
[13]
E. Rotem et al., "Power management architecture of the 2nd generation Intel core microarchitecture, formerly codenamed Sandy Bridge," in Hot Chips, August 2011.
[14]
L. Baird and A. W. Moore, "Gradient descent for general reinforcement learning," Adv. in Neural Information Proc. Sys., pp. 968--974, 1999.
[15]
L. Sheng et al., "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. IEEE/ACM Micro, 2009.
[16]
C. Sun et al., "DSENT - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling," in Proc. IEEE/ACM Int'l Symp. on NoC, 2012, pp. 201--210.
[17]
T. E. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations," in Int'l Conf. for High Perf. Comput., Network., Stor. and Analysis, 2011.

Cited By

View all
  • (2024)HQ-DTM: A Hierarchical Q-learning Algorithm for Dynamic Thermal Management of Multi-core ProcessorsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670842(1-6)Online publication date: 5-Aug-2024
  • (2023)Learning-Oriented Reliability Improvement of Computing Systems From Transistor to Application Level2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137182(1-10)Online publication date: Apr-2023
  • (2023)NPU-Accelerated Imitation Learning for Thermal Optimization of QoS-Constrained Heterogeneous Multi-CoresACM Transactions on Design Automation of Electronic Systems10.1145/362632029:1(1-23)Online publication date: 15-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSI
May 2015
418 pages
ISBN:9781450334747
DOI:10.1145/2742060
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CEDA
  • IEEE CASS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. reinforcement learning
  2. task allocation
  3. thermal aware

Qualifiers

  • Short-paper

Conference

GLSVLSI '15
Sponsor:
GLSVLSI '15: Great Lakes Symposium on VLSI 2015
May 20 - 22, 2015
Pennsylvania, Pittsburgh, USA

Acceptance Rates

GLSVLSI '15 Paper Acceptance Rate 41 of 148 submissions, 28%;
Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HQ-DTM: A Hierarchical Q-learning Algorithm for Dynamic Thermal Management of Multi-core ProcessorsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670842(1-6)Online publication date: 5-Aug-2024
  • (2023)Learning-Oriented Reliability Improvement of Computing Systems From Transistor to Application Level2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137182(1-10)Online publication date: Apr-2023
  • (2023)NPU-Accelerated Imitation Learning for Thermal Optimization of QoS-Constrained Heterogeneous Multi-CoresACM Transactions on Design Automation of Electronic Systems10.1145/362632029:1(1-23)Online publication date: 15-Nov-2023
  • (2023)Dynamic Power Management in Large Manycore Systems: A Learning-to-Search FrameworkACM Transactions on Design Automation of Electronic Systems10.1145/360350128:5(1-21)Online publication date: 8-Sep-2023
  • (2023)B-TSP: An Advanced Power Safe Management Strategy for modern Multi-core Platforms under Thermal-Aware DesignProceedings of the 31st International Conference on Real-Time Networks and Systems10.1145/3575757.3593659(34-44)Online publication date: 7-Jun-2023
  • (2023)Hot-Trim: Thermal and Reliability Management for Commercial Multicore Processors Considering Workload Dependent Hot SpotsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.321655242:7(2290-2302)Online publication date: Jul-2023
  • (2023)Boreas: A Cost-Effective Mitigation Method for Advanced Hotspots using Machine Learning and Hardware Telemetry2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00036(295-305)Online publication date: Apr-2023
  • (2022)NPU-Accelerated Imitation Learning for Thermal- and QoS-Aware Optimization of Heterogeneous Multi-Cores2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774681(584-587)Online publication date: 14-Mar-2022
  • (2022)A Survey of Machine Learning for Computer Architecture and SystemsACM Computing Surveys10.1145/349452355:3(1-39)Online publication date: 3-Feb-2022
  • (2022)LBF-NoC: Learning-Based Framework to Predict Performance, Power and Area for Network-On-Chip ArchitecturesJournal of Circuits, Systems and Computers10.1142/S021812662250196131:11Online publication date: 29-Apr-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media