skip to main content
research-article

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

Published: 06 January 2017 Publication History

Abstract

Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high instruction-level parallelism (ILP), so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.

References

[1]
Shail Aditya, Scott A. Mahlke, and B. Ramakrishna Rau. 2000. Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats. ACM Trans. Des. Autom. Electron. Syst. 5, 4, 2000, 752--773.
[2]
Fakhar Anjam and Stephan Wong. 2013. Configurable fault-tolerance for a configurable VLIW processor. In Reconfigurable Computing: Architectures, Tools and Applications. Springer, 167--178.
[3]
Todd M. Austin. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the 32nd Annual International Symposium on Microarchitecture. 196--207.
[4]
Antonio Carlos Schneider Beck, Carlos Arthur Lang Lisbôa, and Luigi Carro. 2012. Adaptable Embedded Systems. Springer Science 8 Business Media.
[5]
Cristiana Bolchini. 2003. A software methodology for detecting hardware faults in VLIW data paths. IEEE Trans. Reliab. 52, 4, 2003, 458--468.
[6]
Anthony Brandon, et al. 2015. A Sparse VLIW instruction encoding scheme compatible with generic binaries. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig).
[7]
Yung-Yuan Chen and Kuen-Long Leu. 2010. Reliable data path design of VLIW processor cores with comprehensive error-coverage assessment. Microprocess. Microsyst. 34, 1, 2010, 49--61.
[8]
Robert P. Colwell, John O'donnell, David B. Papworth, and Paul K. Rodman. 1991. Instruction storage method with a compressed format using a mask word, U.S. Patent 5057837.
[9]
Thomas M. Conte, Sanjeev Banerjia, Sergei Y. Larin, Kishore N. Menezes, and Sumedh W. Sathaye. 1996. Instruction fetch mechanisms for VLIW architectures with compressed encodings. In. Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO-29). 201--211.
[10]
Jos T. J. van Eijndhoven, et al. 1999. TriMedia CPU64 architecture. In Proceedings of the International Conference on Computer Design (ICCD’99). 586--592.
[11]
Joseph A. Fisher, Paolo Faraboschi, and Clifford Young. 2005. Embedded Computing: a VLIW Approach to Architecture, Compilers and Tools. Elsevier.
[12]
Jiri Gaisler. 1997. Evaluation of a 32-bit microprocessor with built-in concurrent error-detection. In Proceedings of the 27th Annual International Symposium on Fault-Tolerant Computing (FTCS-27). 42--46.
[13]
J. S. P. Giraldo, A. L. Sartor, L. Carro, Stephan Wong, and A. C. S. Beck. 2015. Evaluation of energy savings on a VLIW processor through dynamic issue-width adaptation. In Proceedings of the International Symposium on Rapid System Prototyping (RSP’15). 11--17.
[14]
Qi. Guo, Anderson Sartor, Anthony Brandon, Antonio C. S. Beck, Xuehai Zhou, and Stephan Wong. 2016. Run-time phase prediction for a reconfigurable VLIW processor. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). 1634--1639.
[15]
Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The M{ä}lardalen WCET Benchmarks: Past, Present And Future. WCET 15, 2010, 136--146.
[16]
Jie Hu, Feihui Li, Vijay Degalahal, Mahmut Kandemir, Narayanan Vijaykrishnan, and Mary J. Irwin. 2009. Compiler-assisted soft error detection under performance and energy constraints in embedded systems. ACM Trans. Embed. Comput. Syst. 8, 4, 2009, 27.
[17]
Jie S. Hu, Feihui Li, Vijay Degalahal, Mahmut Kandemir, Narayanan Vijaykrishnan, and Mary J. Irwin. 2005. Compiler-directed instruction duplication for soft error detection. In Design, Automation and Test in Europe (DATE). 1056--1057.
[18]
Zhigang Hu, Alper Buyuktosunoglu, Viji Srinivasan, Victor Zyuban, Hans Jacobson, and Pradip Bose. 2004. Microarchitectural techniques for power gating of execution units. In Proceedings of the International Symposium on Low Power Electronics and Design. 32--37.
[19]
Boris Hubener, Gregor Sievers, Thorsten Jungeblut, Mario Porrmann, and Ulrich Ruckert. 2014. CoreVA: A Configurable Resource-Efficient VLIW Processor Architecture. In Proceedings of the 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC’14). 9--16.
[20]
Texas Instruments. 2011. TMS320C6745/C6747 DSP technical reference manual. SPRUH91A, Texas Instruments Inc.
[21]
Adam Jacobs, Grzegorz Cieslewski, Alan D. George, Ann Gordon-Ross, and Herman Lam. 2012. Reconfigurable fault tolerance: A comprehensive framework for reliable and adaptive FPGA-based space computing. ACM Trans. Reconfigurable Technol. Syst. 5, 4, 2012, 21.
[22]
Sunghyun Jee and Kannappan Palaniappan. 2002. Performance evaluation for a compressed-VLIW processor. In Proceedings of the ACM Symposium on Applied Computing. 913--917.
[23]
Alex K. Jones, Raymond Hoare, Darag Kusic, Justin Stander, Gayatri Mehta, and Josh Fazekas. 2006. A vliw processor with hardware functions: Increasing performance while reducing power. IEEE Trans. Circuits Syst. II Express Briefs 53, 11, 2006, 1250--1254.
[24]
Cameron McNairy and Rohit Bhatia. 2005. Montecito: A dual-core, dual-thread Itanium processor. IEEE Micro 2, 2005, 10--20.
[25]
Bryan Mills, Taieb Znati, and Rami Melhem. 2014. Shadow computing: An energy-aware fault tolerant computing model. In Proceedings of the 2014 International Conference on Computing, Networking and Communications (ICNC’14). 73--77.
[26]
Konstantina Mitropoulou, Vasileios Porpodas, and Marcelo Cintra. 2014. DRIFT: Decoupled compiler-based instruction-level fault-tolerance. In Languages and Compilers for Parallel Computing. Springer, 217--233.
[27]
Shubhendu S. Mukherjee, Christopher Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. 29.
[28]
Nithin Nakka, Karthik Pattabiraman, and Ravishankar Iyer. 2007. Processor-level selective replication. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 544--553.
[29]
Prasad A. Raje and Stuart C. Siu. 1999. Method and apparatus for sequencing and decoding variable length instructions with an instruction boundary marker within each instruction. 1999.
[30]
Joydeep Ray, James C. Hoe, and Babak Falsafi. 2001. Dual use of superscalar datapath for transient-fault detection and recovery. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture. 214--224.
[31]
George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and David I. August. 2005. SWIFT: Software implemented fault tolerance. In Proceedings of the International Symposium on Code Generation and Optimization. 243--254.
[32]
E. Sanchez and M. S. Reorda. 2015. On the functional test of branch prediction units. IEEE Trans. Very Large Scale Integr. Syst. 23, 9, 2015, 1675--1688.
[33]
Anderson L. Sartor, Arthur F. Lorenzon, Luigi Carro, Fernanda Kastensmidt, Stephan Wong, and Antonio Beck. 2015. A novel phase-based low overhead fault tolerance approach for VLIW Processors. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’15). 485--490.
[34]
Anderson Luiz Sartor, Stephan Wong, and Antonio Carlos Schneider Beck. 2016. Adaptive ILP control to increase fault tolerance for VLIW processors. In Proceedings of the IEEE 27th International Conference on Application-Specific Systems, Architectures and Processors (ASAP). IEEE.
[35]
Mario Schölzel. 2007. Reduced triple modular redundancy for built-in self-repair in VLIW-processors. In Signal Processing Algorithms, Architectures, Arrangements and Applications. 21--26.
[36]
Harsh Sharangpani and Ken Arora. 2000. Itanium processor microarchitecture. IEEE Micro 20, 5, 2000, 24--43.
[37]
P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference Dependable Systems and Networks (DSN’02). 389--398.
[38]
Timothy J. Slegel, et al.1999. IBM's S/390 G5 microprocessor design. IEEE Micro 19, 2, 1999, 12--23.
[39]
Pramod Subramanyan, Virendra Singh, Kewal K. Saluja, and Erik Larsson. 2010. Energy-efficient fault tolerance in chip multiprocessors using critical value forwarding. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’10). 121--130.
[40]
Atsuhiro Suga and Kunihiko Matsunami. 2000. Introducing the FR500 embedded microprocessor. IEEE Micro 20, 4, 2000, 21--27.
[41]
Jingweijia Tan and Xin Fu. 2012. RISE: improving the streaming processors reliability against soft errors in gpgpus. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. 191--200.
[42]
Marc Tremblay, Jeffrey Chan, Shailender Chaudhry, Andrew W. Conigliaro, and Shing Sheung Tse. 2000. The MAJC architecture: A synthesis of parallelism and scalability. IEEE Micro 6, 2000, 12--25.
[43]
Carlos Villalpando, David Rennels, Raphael Some, and Manuel Cabanas-Holmen. 2011. Reliable multicore processors for NASA space missions. In Proceedings of the 2011 IEEE Aerospace Conference. 1--12.
[44]
Jan-Willem de Waerdt et al. 2005. The TM3270 media-processor. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. 331--342.
[45]
Stephan Wong, Thijs Van As, and Geoffrey Brown. 2008. ρ-VEX: A reconfigurable and extensible softcore VLIW processor. In Proceedings of the International Conference on ICECE Technology. 369--372.
[46]
Ren Xiaoguang, Xu Xinhai, Wang Qian, Chen Juan, Wang Miao, and Yang Xuejun. 2015. GS-DMR: Low-overhead soft error detection scheme for stencil-based computation. Parallel Comput. 41, 2015, 50--65.
[47]
J. M. Yang and S. W. Kwak. 2010. A checkpoint scheme with task duplication considering transient and permanent faults. In Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM’10). 606--610.
[48]
Ying Zhang and Krishnendu Chakrabarty. 2004. Dynamic adaptation for fault tolerance and power management in embedded real-time systems. ACM Trans. Embed. Comput. Syst. 3, 2, 2004, 336--360.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 13, Issue 2
Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era and Regular Papers
April 2017
377 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3014160
  • Editor:
  • Yuan Xie
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 06 January 2017
Accepted: 01 September 2016
Revised: 01 August 2016
Received: 01 September 2015
Published in JETC Volume 13, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fault tolerance
  2. VLIW
  3. adaptive processor
  4. soft errors

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media