skip to main content
10.1109/MICRO.2014.15acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
tutorial

Calculating Architectural Vulnerability Factors for Spatial Multi-bit Transient Faults

Published: 13 December 2014 Publication History

Abstract

Reliability is an important design constraint in modern microprocessors, and one of the fundamental reliability challenges is combating the effects of transient faults. This requires extensive analysis, including significant fault modelling allow architects to make informed reliability tradeoffs. Recent data shows that multi-bit transient faults are becoming more common, increasing from 0.5% of static random-access memory (SRAM) faults in 180nm to 3.9% in 22nm. Such faults are predicted to be even more prevalent in smaller technology nodes. Therefore, accurately modeling the effects of multi-bit transient faults is increasingly important to the microprocessor design process.
Architecture vulnerability factor (AVF) analysis is a method to model the effects of single-bit transient faults. In this paper, we propose a method to calculate AVFs for spatial multibittransient faults (MB-AVFs) and provide insights that can help reduce the impact of these faults. First, we describe a novel multi-bit AVF analysis approach for detected uncorrected errors (DUEs) and show how to measure DUE MB-AVFs in a performance simulator. We then extend our approach to measure silent data corruption (SDC) MB-AVFs. We find that MB-AVFs are not derivable from single-bit AVFs. We also find that larger fault modes have higher MB-AVFs. Finally, we present a case study on using MB-AVF analysis to optimize processor design, yielding SDC reductions of 86% in a GPU vector register file.

References

[1]
AMD, "AMD graphics cores next (GCN) architecture," http://www.amd.com/Documents/GCN_Architecture_ whitepaper.pdf, 2012.
[2]
AMD, "OpenCL accelerated parallel processing (APP) SDK," http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/downloads/, 2013.
[3]
R. F. Barrett, M. A. Heroux, P. T. Lin, C. T. Vaughan, and A. B. Williams, "Mini-applications: Vehicles for co-design," in SC'11 Companion, 2011, pp. 1--2.
[4]
F. Bauer, G. Georgakos, and D. Schmitt-Landsiedel, in Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, L. Svensson and J. Monteiro, Eds. Berlin, Heidelberg: Springer-Verlag, 2009, ch. A Design Space Comparison of 6T and 8T SRAM Core-Cells, pp. 116--125.
[5]
R. Baumann, "Radiation-induced soft errors in advanced semiconductor technologies," IEEE Trans. on Device and Materials Reliability, vol. 5, no. 3, pp. 305--316, Sept. 2005.
[6]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1--7, Aug. 2011.
[7]
A. Biswas, P. Racunas, R. Cheveresan, J. Emer, S. S. Mukherjee, and R. Rangan, "Computing architectural vulnerability factors for address-based structures," in Int'l Symposium on Computer Architecture (ISCA), 2005, pp. 532--543.
[8]
A. Biswas, P. Racunas, J. Emer, and S. Mukherjee, "Computing accurate AVFs using ACE analysis on performance models: A rebuttal," IEEE Computer Architecture Letters, vol. 7, no. 1, pp. 21--24, 2008.
[9]
A. Biswas, N. Soundararajan, S. S. Mukherjee, and S. Gurumurthi, "Quantized AVF: A means of capturing vulnerability variations over small windows of time," in Workshop on System Effects of Logic Soft Errors (SELSE), 2009.
[10]
S. Che, M. Boyer, M. Jiayuan, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Int'l Symposium on Workload Characterization, Oct. 2009, pp. 44--54.
[11]
C. Constantinescu, M. Butler, and C. Weller, "Error injection-based study of soft error propagation in AMD Bulldozer microprocessor module," in Int'l Conference on Dependable Systems and Networks (DSN), 2012, pp. 1--6.
[12]
A. Dixit, R. Heald, and A. Wood, "Trends from ten years of soft error experimentation," in Workshop on Silicon Errors in Logic - System Effects (SELSE), 2009.
[13]
N. Farazmand, R. Ubal, and D. Kaeli, "Statistical fault injection-based analysis of a GPU architecture," in Workshop on Silicon Errors in Logic - System Effects (SELSE), 2012.
[14]
N. J. George, C. R. Elks, B. W. Johnson, and J. Lach, "Bit-slice logic interleaving for spatial multi-bit soft-error tolerance," in Int'l Conference on Dependable Systems and Networks (DSN), 2010, pp. 141--150.
[15]
N. J. George, C. R. Elks, B. W. Johnson, and J. Lach, "Transient fault models and AVF estimation revisited," in Int'l Conference on Dependable Systems and Networks (DSN), 2010, pp. 477--486.
[16]
I. S. Haque and V. S. Pande, "Hard data on soft errors: A large-scale assessment of real-world error rates in GPGPU," in Int'l Conference on Cluster, Cloud and Grid Computing (CCGRID), 2010, pp. 691--696.
[17]
E. Ibe, H. Taniguchi, Y. Yahagi, K.-I. Shimbo, and T. Toba, "Impact of scaling on neutron-induced soft error in srams from a 250 nm to a 22 nm design rule," in IEEE Transactions on Electron Devices, Jul 2010, pp. 1527--1538.
[18]
H. Jeon, M. Wilkening, V. Sridharan, S. Gurumurthi, and G. Loh, "Architectural vulnerability modeling and analysis of integrated graphics processors," in Workshop on Silicon Errors in Logic - System Effects (SELSE), Stanford, CA, March 2012.
[19]
H. Jeon and M. Annavaram, "Warped-DMR: Light-weight error detection for GPGPU," in Int'l Symposium on Microarchitecture (MICRO), Dec 2012, pp. 37--47.
[20]
X. Jian, H. Duwe, J. Sartori, V. Sridharan, and R. Kumar, "Low-power, low-storage-overhead chipkill correct via multi-line error correction," in Int'l Conference on High Performance Computing, Networking, Storage and Analysis (SC'13), 2013, pp. 24:1--24:12.
[21]
S. Kim and A. K. Somani, "Soft error sensitivity characterization for microprocessor dependability enhancement strategy," in Int'l Conference on Dependable Systems and Networks (DSN), 2002.
[22]
P. Koopman and T. Chakravarty, "Cyclic redundancy code (CRC) polynomial selection for embedded networks," in Int'l Conference on Dependable Systems and Networks (DSN), 2004, pp. 145--154.
[23]
X. Li, S. V. Adve, P. Bose, and J. A. Rivers, "Architecture-level soft error analysis: Examining the limits of common assumptions," in Int'l Conference on Dependable Systems and Networks (DSN), 2007.
[24]
J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, "Characterization of multi-bit soft error events in advanced SRAMs," in Digest of Electron Devices Meeting, December 2003, pp. 21.4.1--21.4.4.
[25]
S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin, "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor," in Int'l Symposium on Microarchitecture (MICRO), 2003.
[26]
A. A. Nair, S. Eyerman, L. Eeckhout, and L. K. John, "A first-order mechanistic model for architectural vulnerability factor," in Int'lSymposium on Computer Architecture (ISCA), 2012, pp. 273--284.
[27]
E. Normand, "Single event upset at ground level," IEEE Transactions on Nuclear Science, vol. 43, no. 6, pp. 2742--2750, Dec 1996.
[28]
A. M. Saleh, J. J. Serrano, and J. H. Patel, "Reliability of scrubbing recovery-techniques for memory systems," IEEE Transactions on Reliability, pp. 114--122, Apr 1990.
[29]
N. Seifert, B. Gill, S. Jahinuzzaman, J. Basile, V. Ambrose, S. Quan, R. Allmon, and A. Bramnik, "Soft error susceptibilities of 22nm tri-gate devices," IEEE Transactions on Nuclear Science, pp. 2666--2673, Dec 2012.
[30]
S. Shazli, M. Abdul-Aziz, M. Tahoori, and D. Kaeli, "A field analysis of system-level effects of soft errors occurring in microprocessors used in information systems," in IEEE International Test Conference (ITC), 2008, pp. 1--10.
[31]
C. Slayman, "Soft error trends and mitigation techniques in memory devices," in Annual Reliability and Maintainability Symposium (RAMS), Jan. 2011, pp. 1--5.
[32]
V. Sridharan and D. R. Kaeli, "Eliminating microarchitectural dependency from architectural vulnerability," in Int'l Symposium on High Performance Computer Architecture (HPCA-15), 2009, pp. 117--128.
[33]
V. Sridharan and D. R. Kaeli, "Using hardware vulnerability factors to enhance AVF analysis," in Int'l Symposium on Computer Architecture (ISCA), 2010, pp. 461--472.
[34]
V. Sridharan, J. Stearley, N. DeBardeleben, S. Blanchard, and S. Gurumurthi, "Feng shui of supercomputer memory: Positional effects in DRAM and SRAM faults," in Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC'13), 2013, pp. 22:1--22:11.
[35]
J. Suh, M. Annavaram, and M. Dubois, "MACAU: A Markov model for reliability evaluations of caches under singlebit and multi-bit upsets," in Int'l Symposium on High Performance Computer Architecture (HPCA), 2012, pp. 1--12.
[36]
J. Suh, M. Manoochehri, M. Annavaram, and M. Dubois, "Soft error benchmarking of L2 caches with PARMA," in Joint Int'l Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2011, pp. 85--96.
[37]
L. G. Szafaryn, B. H. Meyer, and K. Skadron, "Evaluating overheads of multibit soft-error protection in the processor core," IEEE Micro, pp. 56--65, July-Aug 2013.
[38]
J. Tan, N. Goswami, T. Li, and X. Fu, "Analyzing soft-error vulnerability on GPGPU microarchitecture," in Int'l. Symposium on Workload Characterization, 2011.
[39]
R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli, "Multi2Sim: A simulation framework for CPU-GPU computing," in Int'l Conference on Parallel Architectures and Compilation Techniques (PACT), Sep. 2012.
[40]
K. R. Walcott, G. Humphreys, and S. Gurumurthi, "Dynamic prediction of architectural vulnerability from microarchitectural state," in Int'l Symposium on Computer Architecture (ISCA), 2007, pp. 516--527.
[41]
N. J. Wang, A. Mahesri, and S. J. Patel, "Examining ACE analysis reliability estimates using fault-injection," in Int'l Symposium on Computer Architecture (ISCA), 2007, pp. 460--469.
[42]
C. Weaver, J. Emer, S. S. Mukherjee, and S. K. Reinhardt, "Techniques to reduce the soft error rate of a highperformance microprocessor," in International Symposium on Computer Architecture (ISCA), 2004, pp. 264--275.
[43]
Y. Zhang, S. Ghosh, J. Huang, J. W. Lee, S. A. Mahlke, and D. I. August, "Runtime asynchronous fault tolerance via speculation," in Int'l Symposium on Code Generation and Optimization (CGO), 2012, pp. 145--154.

Cited By

View all

Index Terms

  1. Calculating Architectural Vulnerability Factors for Spatial Multi-bit Transient Faults

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture
          December 2014
          697 pages
          ISBN:9781479969982

          Sponsors

          Publisher

          IEEE Computer Society

          United States

          Publication History

          Published: 13 December 2014

          Check for updates

          Author Tags

          1. fault tolerance
          2. reliability
          3. soft errors

          Qualifiers

          • Tutorial
          • Research
          • Refereed limited

          Conference

          MICRO-47
          Sponsor:

          Acceptance Rates

          MICRO-47 Paper Acceptance Rate 53 of 279 submissions, 19%;
          Overall Acceptance Rate 484 of 2,242 submissions, 22%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)1
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 03 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media