skip to main content
research-article
Open access

Fine-Grain Power Breakdown of Modern Out-of-Order Cores and Its Implications on Skylake-Based Systems

Published: 16 December 2016 Publication History

Abstract

A detailed analysis of power consumption at low system levels becomes important as a means for reducing the overall power consumption of a system and its thermal hot spots. This work presents a new power estimation method that allows understanding the power breakdown of an application when running on modern processor architecture such as the newly released Intel Skylake processor. This work also provides a detailed power and performance characterization report for the SPEC CPU2006 benchmarks, analysis of the data using side-by-side power and performance breakdowns, as well as few interesting case studies.

References

[1]
F. Bellosa. 2000. The benefits of event: Driven energy accounting in power-sensitive systems. In Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System. ACM, 37--42.
[2]
R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade. 2010. Decomposable and responsive power models for multicore processors using performance counters. In Proceedings of the 24th ACM International Conference on Supercomputing. ACM, 147--158.
[3]
R. Bertran, M. Gonzàlez, X. Martorell, N. Navarro, and E. Ayguadé. 2013a. Counter-based power modeling methods: Top-down vs. bottom-up. The Computer Journal 56, 2, 198--213.
[4]
R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade. 2013b. A systematic methodology to generate decomposable and responsive power models for CMPs. IEEE Transactions on Computers 62, 7, 1289--1302.
[5]
S. Bhunia, S. Mukhopadhyay. (eds.). 2010. Low-power variation-tolerant design in nanometer silicon. Springer-Verlag.
[6]
A. Carvalho. 2010. The new linux ‘perf’ tools. Presented at the Linux Kongress, 2010. https://scholar.google.co.il/scholar?q=The+new+linux+perf+Carvalho.8btnG=8hl=en8as_sdt=0%2C5.
[7]
H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). IEEE, 189--194.
[8]
N. Firasta, M. Buxton, P. Jinbo, K. Nasri, and S. Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel White Paper.
[9]
J. Haj-Yihia, Y. B. Asher, E. Rotem, A. Yasin, and R. Ginosar. 2015. Compiler-directed power management for superscalars. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4, 48.
[10]
Jawad Haj-Yihia, Ahmad Yasin, Yosi ben Asher, and Avi Mendelson. 2016. Core Power breakdown tool. https://drive.google.com/open?id=0B3IgzCqRS5Q_ZGN0QVFqaWxxY28.
[11]
Intel Corporation. 2014. Intel® 64 and IA-32 Architectures Optimization Reference Manual, Appendix B.1 Intel. (as of August 2014).
[12]
Intel Corporation. 2015. “Intel open source”, online: http://download.01.org/perfmon/ [accesses October 8, 2015].
[13]
Intel® 64 and IA-32 Architectures Software Developer's Manual. 2016a. Volume 3A: System Programming Guide, Part 1, [accesses January 2016a].
[14]
Intel Corporation. 2016b. “6th Generation Intel® Processor Family -- Specification update”, online: http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html [accesses August 2016].
[15]
C. Isci and M. Martonosi. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 93. IEEE Computer Society.
[16]
C. Isci, A. Buyuktosunoglu, C. Y. Cher, P. Bose, and M. Martonosi. 2006. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 347--358.
[17]
A. Kleen. 2015. Toplev manual (pmu-tools), online: https://github.com/andikleen/pmu-tools/wiki/toplev-manual [accesses October 8, 2015].
[18]
M. D. Powell, A. Biswas, J. S. Emer, S. S. Mukherjee, B. R. Sheikh, and S. Yardi. 2009. CAMP: A technique to estimate per-structure power at run-time using a few simple parameters. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, 289--300.
[19]
R. Efraim, R. Ginosar, C. Weiser, and A. Mendelson. 2014. Energy aware race to halt: A down to EARtH approach for platform energy management. IEEE Computer Architecture Letters 13, 1, 25--28.
[20]
E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Rajwan. 2012. Power-management architecture of the intel microarchitecture code-named sandy bridge. IEEE Micro 2, 32, 20--27.
[21]
Y. S. Shao and D. Brooks. 2013. ISA-independent workload characterization and its implications for specialized architectures. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS 2013), 245--255.
[22]
Y. S. Shao, B. Reagen, G. Y. Wei, and D. Brooks. 2014. Aladdin: A preRTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA), 97--108.
[23]
K. Singh, M. Bhadauria, and S. A. McKee. 2009. Real time power estimation and thread scheduling via performance counters. ACM SIGARCH Comput. Architect. News 37, 2, 46--55.
[24]
THE GREEN500 SITES. 2013. http://www.green500.org (accessed December 12, 2013)
[25]
ThinkPad SMAPI kernel module version 0.40. http://tpctl.sourceforge.net/.
[26]
TOP 500 SUPERCOMPUTER SITES. 2013. http://www.top500.org/list/2013/06 (accessed December 12, 2013)
[27]
Vasileios Spiliopoulos, Andreas Sembrant, and Stefanos Kaxiras. 2012. Power-sleuth: A tool for investigating your program's power behavior. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE.
[28]
S. Van den Steen, S. De Pestel, M. Mechri, S. Eyerman, T. Carlson, L. Eeckhout, E. Hagersten, and D. Black-Schaffer. 2015. Micro-architecture independent analytical processor performance and power modeling. In Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Mar. 2015
[29]
A. Yasin. 2014. A top-down method for performance analysis and counters architecture. Presented at the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). https://scholar.google.co.il/scholar?q=A+top-down+method+for+performance+analysis+and+counters+architecture.8btnG=8hl=en8as_sdt=0%2C5.

Cited By

View all

Index Terms

  1. Fine-Grain Power Breakdown of Modern Out-of-Order Cores and Its Implications on Skylake-Based Systems

      Recommendations

      Reviews

      Karthik S Murthy

      Extreme-scale data centers and supercomputers draw many megawatts of power to function. By today's standards, drawing one megawatt of power roughly costs $1 million; hence, reduced power consumption is critical for these systems. Current and upcoming systems are very efficient in terms of power usage effectiveness (PUE); for example, Intel home-built data centers run at 1.06 PUE and Facebook's centers run at 1.078 PUE. These PUE numbers tell us that most of the power drawn on these systems is used for application execution, and therefore developers need to shoulder the responsibility of achieving power efficiency as well. Rotem et al. [1] show that detailed power modeling is necessary because it is not straightforward to gauge whether (1) running a processor at a high frequency to complete the application faster and then putting the processor to sleep or (2) running a processor at a low frequency for a longer time to execute the application results in an efficient power envelope. This paper develops a tool that provides a fine-grained breakdown of the power consumed by different processor and sub-processor domains on the Intel Skylake system. Intel VTune helps identify performance bottlenecks by employing the top-down analysis method developed by Ahmad Yasin in 2014 [2]. Top-down analysis is built on the idea that studying performance counters in isolation is not as informative as studying them in groups. These (sub)groups of performance counters, that is, meta-performance counters, are useful in pinpointing whether the performance bottlenecks in the developer's application are frontend-bound or backend-bound or whether they have occurred due to misspeculations. Similarly, the tool built by the authors helps classify whether the power consumption in an application is frontend-bound, backend-bound, or due to misspeculation. To build these meta-performance counters, the authors had to identify weights for each of the performance counters, which make up a meta-performance counter. To do so, they used a set of training microbenchmarks. Overall, the paper is very informative and nicely written. The experiments are substantial. However, as the authors admit, these counters were studied for one core and one p-state, which is hardly the case in the wild. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
      December 2016
      648 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3012405
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 December 2016
      Accepted: 01 November 2016
      Revised: 01 October 2016
      Received: 01 June 2016
      Published in TACO Volume 13, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Power
      2. energy
      3. performance-counters

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)197
      • Downloads (Last 6 weeks)27
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media