skip to main content
research-article

Task Mapping for Redundant Multithreading in Multi-Cores with Reliability and Performance Heterogeneity

Published: 01 November 2016 Publication History

Abstract

Due to the architectural design, process variations and aging, individual cores in many-core systems exhibit heterogeneous performance. In many-core systems, a commonly adopted soft error mitigation technique is Redundant Multithreading (RMT) that achieves error detection and recovery through redundant thread execution on different cores for an application. However, task mapping and the task execution mode (i.e., whether a task executes in a reliable mode with RMT or unreliable mode without RMT) need to be considered for achieving resource-efficient reliability. This paper explores how to efficiently assign the tasks onto different cores with heterogeneous performance properties and determine the execution modes of tasks in order to achieve high reliability and satisfy the tolerance of timeliness. We demonstrate that the task mapping problem under heterogeneous performance can be solved by employing Hungarian Algorithm as subroutine to efficiently assign the tasks onto the cores to optimize the system reliability with polynomial time complexity. To obtain the efficient task execution modes, we also propose an iterative mode adaptation technique and guarantee the tolerable timing constraint. Our results illustrate that compared to state-of-the-art, the proposed approaches achieve up to $80$ percent reliability improvement (on average $20$ percent) under different scenarios of chip frequency variation maps.

References

[1]
R. Baumann, “ Radiation-induced soft errors in advanced semiconductor technologies,” IEEE Trans. Device Mater. Rel., vol. Volume 5, no. Issue 3, pp. 305–316, 2005.
[2]
J. Henkel, L. Bauer, N. Dutt, P. Gupta, S. Nassif, M. Shafique, M. Tahoori, and N. Wehn, “ Reliable on-chip systems in the nano-era: Lessons learnt and future trends,” in Proc. 50th Annu. Des. Autom. Conf., 2013, pp. 99:1–99:10.
[3]
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin, “ A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor,” in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchit., 2003, p. pp.29.
[4]
P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi, “ Modeling the effect of technology trends on the soft error rate of combinational logic,” in Proc. Int. Conf. Dependable Syst. Netw., 2002, pp. 389–398.
[5]
R. Vadlamani, J. Zhao, W. Burleson, and R. Tessier, “ Multicore soft error rate stabilization using adaptive dual modular redundancy,” in Proc. Conf. Des., Autom. Test Eur., 2010, pp. 27–32.
[6]
S. Rehman, A. Toma, F. Kriebel, M. Shafique, J.-J. Chen, and J. Henkel, “ Reliable code generation and execution on unreliable hardware under joint functional and timing reliability considerations,” in Proc. IEEE 19th Real-Time Embedded Technol. Appl. Symp., 2013, pp. 273–282.
[7]
S. Rehman, M. Shafique, F. Kriebel, and J. Henkel, “ Reliable software for unreliable hardware: Embedded code generation aiming at reliability,” in Proc. 9th Int. Conf. Hardw./Softw. Codes. Syst. Synthesis, 2011, pp. 237–246.
[8]
M. Shafique, S. Rehman, P. V. Aceituno, and J. Henkel, “ Exploiting program-level masking and error propagation for constrained reliability optimization,” in Proc. 50th Annu. Des. Autom. Conf., 2013, pp. 17:1–17:9.
[9]
G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August, and S. S. Mukherjee, “ Software-controlled fault tolerance,” ACM Trans. Archit. Code Optim., vol. Volume 2, pp. 366–396, 2005.
[10]
V. Izosimov, P. Pop, P. Eles, and Z. Peng, “ Synthesis of fault-tolerant embedded systems with checkpointing and replication,” in Proc. 3rd IEEE Int. Workshop Electron. Des., Test Appl., 2006, pp. 440–447.
[11]
J. Henkel, A. Herkersdorf, L. Bauer, T. Wild, M. Hubner, R. Pujari, A. Grudnitsky, J. Heisswolf, A. Zaib, B. Vogel, V. Lari, and S. Kobbe, “ Invasive manycore architectures,” in Proc. 17th Asia South Pacific Des. Autom. Conf., 2012, pp. 193–200.
[12]
J. Jahn, M. Faruque, and J. Henkel, “ Carat: Context-aware runtime adaptive task migration for multi core architectures,” in Proc. Des., Autom. Test Eur. Conf. Exhib., 2011, pp. 1–6.
[13]
T. Ebi, M. Faruque, and J. Henkel, “ Tape: Thermal-aware agent-based power econom multi/many-core architectures,” in Proc. Int. Conf. Comput.-Aided Des., 2009, pp. 302–309.
[14]
M. Al Faruque, R. Krist, and J. Henkel, “ Adam: Run-time agent-based distributed application mapping for on-chip communication,” in Proc. 45th Annu. Des. Autom. Conf., 2008, pp. 760–765.
[15]
T. Corporation. (2013). Tile-gx processor family {Online}. Available: http://www.tilera.com
[16]
(2012). Intel xeon phi™product family {Online}. Available: http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail. html
[17]
(2009). Intel single-chip cloud computer {Online}. Available: http://techresearch.intel.com/ProjectDetails.aspx?Id=1
[18]
N. Tesla. (2013). Tesla processor family {Online}. Available: http://www.nvidia.com
[19]
ITRS. (2011). System drivers {Online}. Available: http://www.itrs.net
[20]
E. Rotenberg, “ AR-SMT: A microarchitectural approach to fault tolerance in microprocessors,” in Proc. 29th Annu. Int. Symp. Fault-Tolerant Comput., 1999, p. pp.84.
[21]
S. K. Reinhardt and S. S. Mukherjee, “ Transient fault detection via simultaneous multithreading,” in Proc. 27th Annu. Int. Symp. Comput. Archit., 2000, pp. 25–36.
[22]
J. Smolens, B. Gold, B. Falsafi, and J. Hoe, “ Reunion: Complexity-effective multicore redundancy,” in Proc. 39th Annu. IEEE/ACM Int. Symp. Microarchit., 2006, pp. 223–234.
[23]
S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, “ Detailed design and evaluation of redundant multithreading alternatives,” in Proc. Annu. Int. Symp. Comput. Archit., 2002, pp. 99–110.
[24]
K. Bowman, S. Duvall, and J. Meindl, “ Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration,” IEEE J. Solid-State Circuits, vol. Volume 37, no. Issue 2, pp. 183–190, 2002.
[25]
M. Shafique, S. Garg, J. Henkel, and D. Marculescu, “ The EDA challenges in the dark silicon era: Temperature, reliability, and variability perspectives,” in Proc. 51st Annu. Des. Autom. Conf., 2014, pp. 1–6.
[26]
ARM. (2013). big.little technology: The future of mobile {Online}. Available: http://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile. pdf
[27]
K. Kuhn, C. Kenyon, A. Kornfeld, M. Liu, A. Maheshwari, S. Wei-kai, S. Sivakumar, G. Taylor, P. VanDerVoorn, and K. Zawadzki, “ Managing process variation in Intel's 45nm CMOS technology,” J. Intel. Technol., vol. Volume 12, pp. 93–109, 2008.
[28]
P. Gupta, Y. Agarwal, L. Dolecek, N. Dutt, R. Gupta, R. Kumar, S. Mitra, A. Nicolau, T. Rosing, M. Srivastava, S. Swanson, and D. Sylvester, “ Underdesigned and opportunistic computing in presence of hardware variability,” IEEE Trans. Comput.-Aided Des. Integrated Circuits Syst., vol. Volume 32, no. Issue 1, pp. 8–23, 2013.
[29]
J. Xiong, V. Zolotov, and L. He, “ Robust extraction of spatial correlation,” IEEE Trans. Comput.-Aided Des. Integrated Circuits Syst., vol. Volume 26, no. Issue 4, pp. 619–631, 2007.
[30]
S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. De, and S. Borkar, “ Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core teraflops processor,” IEEE J. Solid-State Circuits, vol. Volume 46, no. Issue 1, pp. 184–193, 2011.
[31]
I. Kadayif, M. Kandemir, and I. Kolcu, “ Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors,” in Proc. Des., Autom. Test Eur. Conf. Exhib., 2004, pp. 1158–1163.
[32]
K. Kang, S. Gangwal, S. P. Park, and R. Kaushik, “ NBTI induced performance degradation in logic and memory circuits: How effectively can we approach a reliability solution,” in Proc. Asia South Pacific Des. Autom. Conf., 2008, pp. 726–731.
[33]
A. Tiwari and J. Torrellas, “ Facelift: Hiding and slowing down aging in multicores,” in Proc. 41st Annu. IEEE/ACM Int. Symp. Microarchit., 2008, pp. 129–140.
[34]
C. R. Lefurgy, A. J. Drake, M. S. Floyd, M. S. Allen-Ware, B. Brock, J. A. Tierno, J. B. Carter, and R. W. Berry, “ Active guardband management in power7+ to save energy and maintain reliability,” IEEE Micro, vol. Volume 33, no. Issue 4, pp. 35–45, 2013.
[35]
A. G. J. Abella and X. Vera, “ Penelope: The NBTI-aware processor,” in Proc. 40th Annu. IEEE/ACM Int. Symp. Microarchit., 2007, pp. 85–96.
[36]
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, “ Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction,” in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchit., 2003, p. pp.81.
[37]
A. Benso, S. Chiusano, P. Prinetto, and L. Tagliaferri, “ A C/C++ source-to-source compiler for dependable applications,” in Proc. Int. Conf. Dependable Syst. Netw., 2000, pp. 71–78.
[38]
J. Xu, Q. Tan, and R. Shen, “ The instruction scheduling for soft errors based on data flow analysis,” in Proc. IEEE Pacific Rim Int. Symp. Dependable Comput., 2009, pp. 372–378.
[39]
S. Rehman, F. Kriebel, D. Sun, M. Shafique, and J. Henkel, “ dtune: Leveraging reliable code generation for adaptive dependability tuning under process variation and aging-induced effects,” in Proc. 51st Annu. Des. Autom. Conf., 2014, pp. 1–6.
[40]
C. Zhu, Z. Gu, R. Dick, and L. Shang, “ Reliable multiprocessor system-on-chip synthesis,” in Proc. 5th IEEE/ACM/IFIP Int. Conf. Hardw./Softw. Codes. Syst. Synthesis, 2007, pp. 239–244.
[41]
A. Hartman, D. Thomas, and B. Meyer, “ A case for lifetime-aware task mapping in embedded chip multiprocessors,” in Proc. IEEE/ACM/IFIP Int. Conf. Hardw./Softw. Codes. Syst. Synthesis, 2010, pp. 145–154.
[42]
B. Raghunathan, Y. Turakhia, S. Garg, and D. Marculescu, “ Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors,” in Proc. Des., Autom. Test Eur. Conf. Exhib., 2013, pp. 39–44.
[43]
S. Herbert, S. Garg, and D. Marculescu, “ Exploiting process variability in voltage/frequency control,” IEEE Trans. Very Large Scale Integr. Syst., vol. Volume 20, no. Issue 8, pp. 1392–1404, 2012.
[44]
S. Herbert and D. Marculescu, “ Characterizing chip-multiprocessor variability-tolerance,” in Proc. Annu. Des. Autom. Conf., 2008, pp. 313–318.
[45]
S. Rehman, M. Shafique, and J. Henkel, “ Instruction scheduling for reliability-aware compilation,” in Proc. 49th Annu. Des. Autom. Conf., 2012, pp. 1288–1296.
[46]
H. W. Kuhn, “ The Hungarian method for the assignment problem,” Naval Res. Logistics Quarterly, vol. Volume 2, pp. 83–97, 1955.
[47]
G. D. Micheli and L. Benini, Networks on Chips: Technology and Tools. Amsterdam, The Netherlands: Elsevier, 2006.
[48]
S. Rehman, “ Reliable software for unreliable hardware—a cross-layer approach,” Ph.D. Thesis, Karlsruhe Institute of Technology, 2015.
[49]
S. Rehman, K. Chen, F. Kriebel, A. Toma, M. Shafique, J. Chen, and J. Henkel, “ Cross-layer software dependability on unreliable hardware,” IEEE Trans. Comput., vol. Volume 65, no. Issue 1, pp. 80–94, 2016.
[50]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “ Mibench: A free, commercially representative embedded benchmark suite,” in Proc. IEEE Int. Workshop Workload Characterization, 2001, pp. 3–14.
[51]
S. Rehman, F. Kriebel, M. Shafique, and J. Henkel, “ Reliability-driven software transformations for unreliable hardware,” IEEE Trans. Comput.-Aided Des. Integrated Circuits Syst., vol. Volume 33, no. Issue 1, pp. 1597–1610, 2014.
[52]
J. Hu, S. Wang, and S. Ziavras, “ In-register duplication: Exploiting narrow-width value for improving register file reliability,” in Proc. Int. Conf. Dependable Syst. Netw., 2006, pp. 281–290.
[53]
L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. Irwin, “ Soft error and energy consumption interactions: A data cache perspective,” in Proc. Int. Symp. Low Power Electron. Des., 2004, pp. 132–137.
[54]
R. Azevedo, S. Rigo, M. Bartholomeu, G. Araujo, C. C. de Araujo, and E. Barros, “ The ArchC architecture description language and tools,” Int. J. Parallel Program., vol. Volume 33, pp. 453–484, 2005.
[55]
(2006, Sep.). Flux calculator {Online}. Available: http://www.seutest.com/cgi-bin/FluxCalculator.cgi</uri>. {Online}. Available: <uri>http://www.seutest.com/cgi-bin/FluxCalculator.cgi
[56]
G. P. Saggese, N. J. Wang, Z. Kalbarczyk, S. J. Patel, and R. K. Iyer, “ An experimental study of soft errors in microprocessors,” IEEE Micro, vol. Volume 25, no. Issue 6, pp. 30–39, 2005.
[57]
S. Garg and D. Marculescu, “ System-level throughput analysis for process variation aware multiple voltage-frequency island designs,” ACM Trans. Des. Autom. Electron. Syst., vol. Volume 13, no. Issue 4, pp. 59:1–59:25, 2008.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 65, Issue 11
November 2016
285 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media