skip to main content
column

Systematic evaluation of workload clustering for extremely energy-efficient architectures

Published: 29 May 2013 Publication History

Abstract

Chip power consumption has reached its limits, leading to the flattening of single-core performance. We propose the 10x10 processor, a federated heterogeneous multi-core architecture, where each core is an ensemble of u-engines (micro-engines, similar to accelerators) specialized for different workload groups to achieve dramatically higher energy efficiency. The u-engines collectively target the entire general-purpose workload space.
The problem we study in this article is selecting the set of workloads that each u-engine should be customized for. For this problem we study the computation structure of a wide variety of workloads and cluster together workloads with similar computation structures, the idea being that each u-engine will be customized for the compute structures exhibited by a particular cluster. The constraint on this problem is the silicon budget of a processor. Lower silicon budgets accommodate fewer uengines and require individual u-engines to target larger segments of the workload space which leads to lower energy efficiency benefits from customization, because there is more variation among the compute structures making up each cluster. Therefore, we also study how workload coverage and benefit can be maximized for a given silicon budget.
We study a broad general-purpose workload that includes 34 codes from 6 benchmark suites, identifying the most frequent functions, and clustering them based on two sets of instruction usage features (high-resolution and low-resolution) into 8, 16, 32, 64, 128 clusters respectively. We develop abstract metrics (coverage and weighted customization benefit) to evaluate the clusters. We show significant potential payoffs with four benefit models: 2-3x (square root model), 4-10x (linear model), 12-24x (quadratic model), and 22-26x (cubic model).

References

[1]
"Advanced vector extensions," http://en.wikipedia.org/wiki/Advanced Vector Extensions.
[2]
"Apple a5," http://en.wikipedia.org/wiki/Apple A5.
[3]
"Euclidean distance," http://en.wikipedia.org/wiki/Euclidean distance.
[4]
"Hierarchical clustering," http://en.wikipedia.org/wiki/Hierarchical clustering.
[5]
"High performance linpack on xeon 5500 v. opteron 2400," http://www.advancedclustering.com/company-blog/highperformance-linpack-on-xeon-5500-v-opteron-2400.html.
[6]
"IBM 360," http://en.wikipedia.org/wiki/IBM System/360.
[7]
"IBM 370," http://en.wikipedia.org/wiki/IBM System/370.
[8]
"Intel 8087," http://en.wikipedia.org/wiki/Intel 8087.
[9]
"MMX instruction set," http://en.wikipedia.org/wiki/MMX %28instruction set 29.
[10]
"Novel chip technology to power GRAPE-8 supercomputer," http://www.hpcwire.com/hpcwire/2012-05-10/novel chip technology to power grape-8 supercomputer.html.
[11]
"The R project for statistical computing," http://www.rproject.org/.
[12]
K. Albayraktaroglu et al., "Biobench: A benchmark suite of bioinformatics applications," in ISPASS 2005.
[13]
S. Amarasinghe et al., "Exascale software study: Software challenges in extreme scale systems," DARPA IPTO, Air Force Research Labs, Tech. Rep, 2009.
[14]
J. Balfour, et al, "An energy-efficient processor architecture for embedded systems," Computer Architecture Letters, vol. 7, no. 1, 2007.
[15]
K. Bergman et al., "Exascale computing study: Technology challenges in achieving exascale systems," 2008.
[16]
C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011.
[17]
S. Borkar et al., "The future of microprocessors," Commun. ACM, vol. 54, May 2011.
[18]
M. Breughe, et al, "How sensitive is processor customization to the workload's input datasets?" in Application Specific Processors (SASP), 2011 IEEE 9th Symposium on, 2011.
[19]
D. Chang et al, "Ercbench: An open-source benchmark suite for embedded and reconfigurable computing," in International Conference on Field Programmable Logic and Applications (FPL), 2010.
[20]
A. A. Chien, "10x10 must replace 90/10: the future of computer architecture," in Salishan Conference on High Performance Computing, 2010, http://www.lanl.gov/orgs/hpc/salishan/salishan2010/pdfs/.
[21]
H. Esmaeilzadeh et al., "Dark silicon and the end of multicore scaling," ISCA 2011.
[22]
V. Govindaraju et al, "Dynamically specialized datapaths for energy efficient computing," in HPCA 2011.
[23]
S. Gupta et al, "Bundled execution of recurring traces for energy-efficient general purpose processing," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2011, pp. 12--23.
[24]
M. Guthaus et al, "Mibench: A free, commercially representative embedded benchmark suite," in IEEE International Workshop on Workload Characterization, 2001.
[25]
R. Hameed et al, "Understanding sources of inefficiency in general-purpose chips," ser. ISCA '10.
[26]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Toward dark silicon in servers," IEEE Micro, vol. 31, no. 4, pp. 6--15, Jul. 2011.
[27]
M. Hill et al., "Amdahl's law in the multicore era," IEEE Computer, vol. 41, no. 7, pp. 33--38, 2008.
[28]
H. Kaul, et al, "Near-threshold voltage (NTV) design: opportunities and challenges," in Proceedings of the 49th Annual Design Automation Conference, 2012.
[29]
S. Kottapalli et al, "Nehalem-EX CPU architecture," in Hot chips, 2009.
[30]
R. Kumar, et al., "Single-isa heterogeneous multi-core architectures: The potential for processor power reduction," in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 36, 2003.
[31]
C. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," in PLDI, 2005.
[32]
Adapteva Inc., "Epiphany architecture reference (g3)," 2012.
[33]
IBM, "CryptoCards," http://www-03.ibm.com/security/cryptocards.
[34]
ITRS, "International technology roadmap for semiconductors 2010 update," 2011, http://www.itrs.net.
[35]
Texas Instruments, "TMS32066x DSPs," 2011, http://www.ti.com/dsp/docs/dspsplash.tsp?contentId=145764.
[36]
Tilera Corporation, "Tile64 processor product brief," 2009.
[37]
A. Pedram, R. A. Geijn, and A. Gerstlauer, "Co-design tradeoffs for high-performance, low-power linear algebra architectures," IEEE Transactions on Computers, 2012, to appear.
[38]
R. Rodrigues, et al, "Performance per watt benefits of dynamic core morphing in asymmetric multicores."
[39]
D. Shaw, et al, "Anton, a special-purpose machine for molecular dynamics simulation," in Proceedings of the 34th annual international symposium on Computer architecture, ser. ISCA'07, 2007.
[40]
G. Venkatesh, et al, "QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores," 2011.
[41]
G. Venkatesh et al, "Conservation cores: reducing the energy of mature computations," in Proceedings of ASPLOS-XV, ser. ASPLOS '10.
[42]
Y. Wu, et al, "A HW/SW co-designed heterogeneous multicore virtual machine for energy-efficient general purpose computing," in 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 41, Issue 2
May 2013
71 pages
ISSN:0163-5964
DOI:10.1145/2490302
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2013
Published in SIGARCH Volume 41, Issue 2

Check for updates

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media