skip to main content
10.1145/2465813.2465816acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

Published: 18 June 2013 Publication History

Abstract

In this paper, we compare the radiation response of GPUs executing matrix multiplication and FFT algorithms. The provided experimental results demonstrate that for both algorithms, in the majority of cases, the output is affected by multiple errors. The architectural and code analysis highlight that multiple errors are caused by shared resources corruption or thread dependencies. The experimental data and analytical studies can be fruitfully employed to evaluate the expected error rate of GPUs in realistic applications and to design specific and optimized software-based hardening procedures.

References

[1]
J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing" Proceedings of the IEEE, vol.96, no.5, pp.879--899, May 2008.
[2]
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture" IEEE MICRO, vol. 28, n. 2, March/April 2008, pp. 39--55.
[3]
J. Kruger and R. Westermann, "Linear Algebra operators for GPU implementation of numerical algorithms", ACM Trans. Graph. n. 22, vol. 3, 2003, pp. 908--916.
[4]
J. Liepe, C. Barnes, E. Cule, K. Erguler, P. Kirk, T. Toni, and M. P. H. Stumpf, "ABC-SysBio-approximate Bayesian computation in Python with GPU support" -- Bioinformatics, vol. 26, n. 14, July 2012, pp. 1797--1799.
[5]
Introducing Titan, www.olcf.ornl.gov/titan.
[6]
P. Rech, C. Aguiar, R. Ferreira, M. Silvestri, A. Griffoni, C. Frost, and L. Carro, "Neutron-Induced Soft Error in Graphic Processing Units", in proc. IEEE REDW 2012, Miami, FL, USA.
[7]
P. Rech, C. Aguiar, C. Frost, and L. Carro, "Neutron Radiation Test of Graphic Processing Units", in proc. IEEE IOLTS 2012, Sitges, Spain.
[8]
N. Seifert, Zhu Xiaowei, and L. W. Massengill, "Impact of Scaling on Soft-Error Rates in Commercial Microprocessors", IEEE Trans. Nucl. Sci, vol. 46, no. 6, pp. 3100, 2002, 3106.
[9]
H.T. Nguyen, Y. Yagil, N. Seifert, and M. Reitsma, "Chip-level Soft Error Estimation Method", IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, 2005, pp. 356, 381.
[10]
P. Rech, C. Aguiar, C. Frost, and L. Carro, "Experimental Evaluation of Software Hardening Techniques for GPUs", in proc. IEEE RADECS 2012, Bordeaux, France.
[11]
D. B. Kirk, W.W. Hwo, "Programming Massively Parallel Processors", MK Publishers.
[12]
NVIDIA GeForce GTX 480/470/465 GPU Datasheet
[13]
NVIDIA Tesla C2050/C2075 GPU Datasheet
[14]
M. Violante, et al., "A New Hardware/Software Platform and a New 1/E Neutron Source for Soft Error Studies: Testing FPGAs at the ISIS Facility", IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 1184--1189.
[15]
R.C. Baumann, "Neutron-induced boron fission as a major source of soft errors in deep submicron SRAM devices", in proc. IEEE IRPS 2000, pp. 152--157.
[16]
P. Rech, C. Aguiar, C. Frost, and L. Carro, "Experimental Evaluation of Thread Distribution Effects on Multiple Output Errors in GPUs", in proc. IEEE ETS 2013, Avignon, France
[17]
E. Normand, "Single Event Effects in Avionics", IEEE Trans. Nucl. Sci., Vol. 43, No. 2, Apr. 1996, pp. 461--474.
[18]
NVIDIA BENCH: Tesla C2050 Performance Benchmarks
[19]
K.H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations", IEEE Trans. on Computers, vol. c-33, no. 6, June 1984, pp. 518--528.
[20]
R. Freivalds, Fast Probabilistic Algorithms, In Matematical Formulations of CS, Lecture notes in Computer Science, vol. 74, 1979, pp. 57--69.
[21]
D. Bailey, et al., "The NAS Parallel Benchmarks", RNR Technical Report RNR-94-007, March 1994.
[22]
T. G. Stockham, "High-Speed Convolution and Correlation", in proc. Spring Joint Computer Conference, 1966, pp. 229--233.
[23]
S. Caminiti, I. Finocchi, E. G. Fusco, and F. Silvestri, "Dynamic programming in faulty memory hierarchies (cache-obliviously)", in proc. of 31st FSTTCS, LIPIcs 13, pp. 433--444.
[24]
R. M. Karp and M. O. Rabin, "Efficient randomized pattern-matching algorithms", IBM J. Res. Dev., 1987, vol. 31, no. 2, pp. 249--260.

Cited By

View all

Index Terms

  1. Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FTXS '13: Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
      June 2013
      64 pages
      ISBN:9781450319836
      DOI:10.1145/2465813
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 June 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU
      2. parallel architectures sensitivity
      3. radiation effects
      4. software-based hardening

      Qualifiers

      • Research-article

      Conference

      HPDC'13
      Sponsor:

      Acceptance Rates

      FTXS '13 Paper Acceptance Rate 7 of 10 submissions, 70%;
      Overall Acceptance Rate 16 of 25 submissions, 64%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 27 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media