skip to main content
10.5555/2830840.2830849acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

dsReliM: power-constrained reliability management in dark-silicon many-core chips under process variations

Published: 04 October 2015 Publication History

Abstract

Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.

References

[1]
International technology roadmap for semiconductors, http://public.itrs.net/reports.html.
[2]
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In International Symposium on Computer Architecture (ISCA), pp. 365--376, 2011.
[3]
M. B. Taylor. Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse. In ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1131--1136, 2012.
[4]
M. Shafique, S. Garg, J. Henkel, and D. Marculescu. The eda challenges in the dark silicon era: Temperature, reliability, and variability perspectives. In ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1--6, 2014.
[5]
M. Shafique, S. Garg, T. Mitra, S. Parameswaran, and J. Henkel. Dark silicon as a challenge for hardware/software co-design: invited special session paper. In IEEE CODES+ISSS 2014.
[6]
R. Teodorescu and J. Torrellas. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors. In International Symposium on Computer Architecture (ISCA), pp. 363--374, 2008.
[7]
M. Shafique, D. Gnad, S. Garg, J. Henkel. Variability-Aware Dark Silicon Management in On-Chip Many-Core Systems. In IEEE/ACM Design, Automation and Test in Europe Conference (DATE'15), France, 9--13 March, 2015.
[8]
N. Kapadia, S. Pasricha. VARSHA: Variation and Reliability-Aware Application Scheduling with Adaptive Parallelism in the Dark-Silicon Era In IEEE/ACM Design, Automation and Test in Europe Conference (DATE'15), France, 9--13 March, 2015.
[9]
R. Zheng, J. Velamala, V. Reddy, V. Balakrishnan, E. Mintarno, S. Mitra, S. Krishnan, Y. Cao. Circuit aging prediction for low-power operation. In Custom Integrated Circuits Conference (CICC), pp.427--430, 2009.
[10]
J. Henkel, L. Bauer, N. Dutt, P. Gupta, S. Nassif, M. Shafique, M. Tahoori, N. When. Reliable On-Chip Systems in the Nano-Era: Lessons Learnt and Future Trends. In ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1--10, 2013.
[11]
D. Brooks, R. P. Dick, R. Joseph, L. Shang. Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors, In IEEE Micro, pp.49--62, 2007.
[12]
T. D. Loveless, S. Jagannathan, T. Reece, J. Chetia, B. L. Bhuva, M. W. McCurdy, L. W. Massengill, S.-J. Wen, R. Wong, D. Rennie. Neutron- and proton-induced single event upsets for D- and DICE-flip/flop designs at a 40 nm technology node, IEEE Trans. Nucl. Sci., 58(3), pp. 1008--1014, June 2011.
[13]
S. S. Mukherjee, M. Kontz, S. K. Reinhardt. Detailed design and evaluation of redundant multithreading alternatives, In IEEE Int'l Symp. Comput. Arch. (ISCA), pp. 99--110, 2002.
[14]
A. Shye, T. Moseley, V. Janapa Reddi, J. Blomstedt, D. A. Connors. Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance, In IEEE/IFIP Int'l Conf. Depend. Syst. Net. (DSN), pp. 297--306, 2007.
[15]
M. Salehi, M. K. Tavana, S. Rehman, F. Kriebel, M. Shafique, A. Ejlali, J. Henkel. DRVS: Power-Efficient Reliability Management through Dynamic Redundancy and Voltage Scaling under Variations. In ACM/IEEE Int'l Symp. Low Power Electron. Design (ISLPED), 2015.
[16]
K. K. Rangan, M. Powell, G.-Y. Wei, D. Brooks. Achieving Uniform Performance and Maximizing Throughput in the Presence of Heterogeneity, In IEEE Int'l Symp. High Perform. Comput. Arch. (HPCA), pp. 3--14, 2011.
[17]
S. Rehman, M. Shafique, F. Kriebel, J. Henkel. Reliable software for unreliable hardware: Embedded code generation aiming at reliability, In IEEE CODES+ISSS, pp. 237--246, 2011.
[18]
S. Rehman. A. Toma, F. Kriebel, M. Shafique, J.-J. Chen, J. Henkel. Reliable Code Generation and Execution on Unreliable Hardware under Joint Functional and Timing Reliability Considerations, In IEEE Real-Time Embed. Tech. App. Symp. (RTAS), pp. 273--282, 2013.
[19]
A. Ejlali, B. M. Al-Hashimi, and P. Eles. Low-Energy Stand-by-Sparing for Hard Real-Time Systems. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst., 31 (3), 329--342.
[20]
D. Zhu, R. Melhem, D. Mossé. The effects of energy management on reliability in real-time embedded systems, In IEEE/ACM Int'l Conf. Comput. Aided Design (ICCAD), pp.35--40, 2004.
[21]
I. Koren, C. M. Krishna. Fault-Tolerant Systems. Morgan Kaufmann, Elsevier, San Francisco, CA, 2007.
[22]
F. Kriebel, S. Rehman, D. Sun, M. Shafique, and J. Henkel. ASER: Adaptive Soft Error Resilience for Reliability-Heterogeneous Processors in the Dark Silicon Era. In ACM/EDAC/IEEE Design Automation Conference (DAC), pp.1--6, 2014.
[23]
T. D. Burd, T. A. Pering, A. J. Stratakos, and R. Brodersen. A dynamic voltage scaled microprocessor system. IEEE J. Solid-State Circuits (JSSC), 35 (11), 1571--1580, 2000.
[24]
M. Salehi and A. Ejlali. A Hardware Platform for Evaluating Low-Energy Multiprocessor Embedded Systems Based on COTS Devices, IEEE Trans. Indust. Electron., 62(2), 2015.
[25]
"Gurobi optimizer." {Online}. Available: www.gurobi.com.
[26]
R. Karp. Reducibility among combinatorial problems. Springer, 1972.
[27]
J. Mitchell. Branch-and-cut algorithms for combinatorial optimization problems, in Handbook of Applied Optimization, 2002.
[28]
G. Liu, J. Park, and D. Marculescu. Dynamic Thread Mapping for High-Performance, Power-Efficient Heterogeneous Many-core Systems. In IEEE Int'l Conf. Comput. Design (ICCD), pp. 54--61, 6--9 Oct. 2013.
[29]
B. Raghunathan, Y. Turakhia, S. Garg, and D. Marculescu. Cherry-Picking: Exploiting Process Variations in Dark-Silicon Homogeneous Chip Multi-Processors, In IEEE/ACM Design, Automation and Test in Europe Conference (DATE'13), pp. 39--44, 2013.
[30]
P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. Modeling within-die spatial correlation effects for process-design co-optimization, In IEEE Int'l Symp. Quality Electron. Design (ISQED), pp. 516--521, 2005.
[31]
S. Herbert and D. Marculescu. Variation-Aware Dynamic Voltage/Frequency Scaling, In IEEE Int'l Symp. High Perform. Comput. Arch. (HPCA), pp. 301--312, 2009.
[32]
A. Srivastava, R. Bai, D. Blaauw, D. Sylvester. Modeling and analysis of leakage power considering within-die process variations, In ACM/IEEE Int'l Symp. Low Power Electron. Design (ISLPED), pp. 64--67, 2002.
[33]
Predictive Technology Model (PTM). available at: http://ptm.asu.edu/latest.html.
[34]
M. K. Tavana, A. Kulkarni, A. Rahimi, T. Mohsenin, and H. Homayoun. Energy-Efficient Mapping of Biomedical Applications on Domain-Specific Accelerator under Process Variation, In ACM/IEEE Int'l Symp. Low Power Electron. Design (ISLPED), 2014.
[35]
D. Rhodes and R. Dick. TGFF: Task Graphs for Free, In CODES/CASHE, 1998.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CODES '15: Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis
October 2015
242 pages
ISBN:9781467383219

Sponsors

Publisher

IEEE Press

Publication History

Published: 04 October 2015

Check for updates

Author Tags

  1. constrained-optimization
  2. dark silicon
  3. many-core
  4. power-efficiency
  5. process variation
  6. reliability
  7. soft errors

Qualifiers

  • Research-article

Conference

ESWEEK'15
ESWEEK'15: ELEVENTH EMBEDDED SYSTEM WEEK
October 4 - 9, 2015
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 106
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media