research-article

No PAIN, no gain?: the utility of PArallel fault INjections

Authors:

Oliver Schwahn,

Roberto Natella,

Domenico CotroneoAuthors Info & Claims

ICSE '15: Proceedings of the 37th International Conference on Software Engineering - Volume 1

Pages 494 - 505

Published: 16 May 2015 Publication History

Abstract

Software Fault Injection (SFI) is an established technique for assessing the robustness of a software under test by exposing it to faults in its operational environment. Depending on the complexity of this operational environment, the complexity of the software under test, and the number and type of faults, a thorough SFI assessment can entail (a) numerous experiments and (b) long experiment run times, which both contribute to a considerable execution time for the tests.

In order to counteract this increase when dealing with complex systems, recent works propose to exploit parallel hardware to execute multiple experiments at the same time. While PArallel fault INjections (PAIN) yield higher experiment throughput, they are based on an implicit assumption of non-interference among the simultaneously executing experiments. In this paper we investigate the validity of this assumption and determine the trade-off between increased throughput and the accuracy of experimental results obtained from PAIN experiments.

References

[1]

J. Voas, F. Charron, G. McGraw, K. Miller, and M. Friedman, "Predicting How Badly "Good" Software Can Behave," IEEE Softw., vol. 14, no. 4, pp. 73--83, 1997.

Digital Library

[2]

J. Durães and H. Madeira, "Emulation of Software faults: A Field Data Study and a Practical Approach," IEEE Trans. Softw. Eng., vol. 32, no. 11, pp. 849--867, 2006.

Digital Library

[3]

D. Cotroneo and R. Natella, "Fault Injection for Software Certification," IEEE Security Privacy, vol. 11, no. 4, pp. 38--45, 2013.

Digital Library

[4]

P. Koopman and J. DeVale, "The Exception Handling Effectiveness of POSIX Operating Systems," IEEE Trans. Softw. Eng., vol. 26, no. 9, pp. 837--848, 2000.

Digital Library

[5]

J. Arlat, J. Fabre, M. Rodríguez, and F. Salles, "Dependability of COTS Microkernel-Based Systems," IEEE Trans. Comput., vol. 51, no. 2, pp. 138--163, 2002.

Digital Library

[6]

D. Di Leo, F. Ayatolahi, B. Sangchoolie, J. Karlsson, and R. Johansson, "On the Impact of Hardware Faults--An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions," in Proc. SAFECOMP'12, 2012, pp. 198--209.

Digital Library

[7]

R. Natella, D. Cotroneo, J. Durães, and H. Madeira, "On Fault Representativeness of Software Fault Injection," IEEE Trans. Softw. Eng., vol. 39, no. 1, pp. 80--96, Jan. 2013.

Digital Library

[8]

H. S. Gunawi, T. Do, P. Joshi, P. Alvaro, J. M. Hellerstein, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, K. Sen, and D. Borthakur, "FATE and DESTINI: A Framework for Cloud Recovery Testing," in Proc. NSDI'11, 2011.

Digital Library

[9]

P. Joshi, H. S. Gunawi, and K. Sen, "PREFAIL: A Programmable Tool for Multiple-failure Injection," in Proc. OOP-SLA'11, 2011, pp. 171--188.

Digital Library

[10]

A. Lanzaro, R. Natella, S. Winter, D. Cotroneo, and N. Suri, "An Empirical Study of Injected versus Actual Interface Errors," in Proc. ISSTA'14, 2014, pp. 397--408.

Digital Library

[11]

S. Winter, M. Tretter, B. Sattler, and N. Suri, "simFI: From single to simultaneous software fault injections," in Proc. DSN'13, Jun. 2013, pp. 1--12.

Digital Library

[12]

Y. Jia and M. Harman, "Higher Order Mutation Testing," Information and Software Technology, vol. 51, no. 10, pp. 1379--1393, 2009.

Digital Library

[13]

M. Papadakis and N. Malevris, "An empirical evaluation of the first and second order mutation testing strategies," in Proc. ICSTW'10, 2010, pp. 90--99.

Digital Library

[14]

Y. Jia and M. Harman, "Constructing Subtle Faults Using Higher Order Mutation Testing," in Proc. SCAM'08, Sep. 2008, pp. 249--258.

[15]

A. Siami Namin, J. H. Andrews, and D. J. Murdoch, "Sufficient Mutation Operators for Measuring Test Effectiveness," in Proc. ICSE'08, 2008, pp. 351--360.

Digital Library

[16]

A. Lastovetsky, "Parallel testing of distributed software," Information and Software Technology, vol. 47, no. 10, pp. 657--662, 2005.

Digital Library

[17]

A. Duarte, W. Cirne, F. Brasileiro, and P. Machado, "GridUnit: Software Testing on the Grid," in Proc. ICSE'06, 2006, pp. 779--782.

Digital Library

[18]

M. Oriol and F. Ullah, "YETI on the Cloud," in Proc. ICSTW'10, Apr. 2010, pp. 434--437.

Digital Library

[19]

D. Cotroneo, M. Grottke, R. Natella, R. Pietrantuono, and K. S. Trivedi, "Fault Triggers in Open-Source Software: An Experience Report," in Proc. ISSRE'13, 2013, pp. 178--187.

[20]

W. T. Ng and P. M. Chen, "The design and verification of the rio file cache," IEEE Trans. Comput., vol. 50, no. 4, pp. 322--337, 2001.

Digital Library

[21]

M. M. Swift, B. N. Bershad, and H. M. Levy, "Improving the reliability of commodity operating systems," in Proc. SOSP'03, 2003, pp. 207--222.

Digital Library

[22]

J. Durães and H. Madeira, "Multidimensional characterization of the impact of faulty drivers on the operating systems behavior," IEICE Transactions on Information and Systems, vol. 86, no. 12, pp. 2563--2570, 2003.

[23]

J. Durães, M. Vieira, and H. Madeira, "Dependability Benchmarking of Web-Servers," in Computer Safety, Reliability, and Security, ser. Lecture Notes in Computer Science, vol. 3219, 2004, pp. 297--310.

[24]

M. Vieira and H. Madeira, "A dependability benchmark for OLTP application environments," in Proc. VLDB'03, 2003, pp. 742--753.

Digital Library

[25]

A. Albinet, J. Arlat, and J.-C. Fabre, "Characterization of the impact of faulty drivers on the robustness of the linux kernel," in Proc. DSN'04, 2004, pp. 867--876.

Digital Library

[26]

A. Bondavalli, S. Chiaradonna, D. Cotroneo, and L. Romano, "Effective fault treatment for improving the dependability of COTS and legacy-based applications," IEEE Trans. Dependable Secure Comput., vol. 1, no. 4, pp. 223--237, 2004.

Digital Library

[27]

A. Bondavalli, A. Ceccarelli, L. Falai, and M. Vadursi, "Foundations of measurement theory applied to the evaluation of dependability attributes," in Proc. DSN'07, 2007, pp. 522--533.

Digital Library

[28]

J. Carreira, H. Madeira, and J. G. Silva, "Xception: a technique for the experimental evaluation of dependability in modern computers," IEEE Trans. Softw. Eng., vol. 24, no. 2, pp. 125--136, 1998.

Digital Library

[29]

J. Aidemark, J. Vinter, P. Folkesson, and J. Karlsson, "GOOFI: Generic Object-Oriented Fault Injection Tool," in Proc. DSN'01, 2001, pp. 83--88.

Digital Library

[30]

D. Stott, B. Floering, Z. Kalbarczyk, and R. Iyer, "A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors," in Proc. IPDS'00, 2000, pp. 91--100.

Digital Library

[31]

D. Skarin, R. Barbosa, and J. Karlsson, "Comparing and validating measurements of dependability attributes," in Proc. EDCC'10, 2010, pp. 3--12.

Digital Library

[32]

E. van der Kouwe, C. Giuffrida, and A. S. Tanenbaum, "Evaluating Distortion in Fault Injection Experiments," in Proc. HASE'14, 2014.

Digital Library

[33]

R. Chandra, R. M. Lefever, K. R. Joshi, M. Cukier, and W. H. Sanders, "A global-state-triggered fault injector for distributed system evaluation," IEEE Trans. Parallel Distrib. Syst., vol. 15, no. 7, pp. 593--605, 2004.

Digital Library

[34]

D. Cotroneo, R. Natella, S. Russo, and F. Scippacercola, "State-Driven Testing of Distributed Systems," in Proc. OPODIS'13, 2013, pp. 114--128.

Digital Library

[35]

I. Irrera, J. Durães, H. Madeira, and M. Vieira, "Assessing the Impact of Virtualization on the Generation of Failure Prediction Data," in Proc. LADC'13, 2013, pp. 92--97.

Digital Library

[36]

E. Starkloff, "Designing a parallel, distributed test system," in Proc. AUTOTESTCON'00, 2000, pp. 564--567.

[37]

G. M. Kapfhammer, "Automatically and Transparently Distributing the Execution of Regression Test Suites," in Proc. ICTCS'01, 2001.

[38]

A. N. Duarte, W. Cirne, F. Brasileiro, and P. Duarte De Lima Machado, "Using the Computational Grid to Speed up Software Testing," in Proc. SBES'05, 2005.

[39]

A. Duarte, G. Wagner, F. Brasileiro, and W. Cirne, "Multienvironment Software Testing on the Grid," in Proc. PAD-TAD'06, 2006, pp. 61--68.

Digital Library

[40]

T. Parveen, S. Tilley, N. Daley, and P. Morales, "Towards a distributed execution framework for JUnit test cases," in Proc. ICSM'09, Sep. 2009, pp. 425--428.

[41]

L. Yu, L. Zhang, H. Xiang, Y. Su, W. Zhao, and J. Zhu, "A Framework of Testing as a Service," in Proc. MASS'09, Sep. 2009, pp. 1--4.

[42]

L. Yu, W.-T. Tsai, X. Chen, L. Liu, Y. Zhao, L. Tang, and W. Zhao, "Testing as a Service over Cloud," in Proc. SOSE'10, Jun. 2010, pp. 181--188.

Digital Library

[43]

M. Staats and C. Păsăreanu, "Parallel Symbolic Execution for Structural Test Generation," in Proc. ISSTA'10, 2010, pp. 183--194.

Digital Library

[44]

G. Candea, S. Bucur, and C. Zamfir, "Automated Software Testing as a Service," in Proc. SoCC'10, 2010, pp. 155--160.

Digital Library

[45]

L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea, "Cloud9: A Software Testing Service," SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 5--10, Jan. 2010.

Digital Library

[46]

R. Mahmood, N. Esfahani, T. Kacem, N. Mirzaei, S. Malek, and A. Stavrou, "A whitebox approach for automated security testing of Android applications on the cloud," in Proc. AST'12, 2012, pp. 22--28.

Digital Library

[47]

J. Gray, "Why do computers stop and what can be done about it?" Tandem Computers, Tech. Rep. TR-85.7, 1986.

[48]

M. Grottke and K. Trivedi, "Fighting Bugs: Remove, Retry, Replicate, and Rejuvenate," IEEE Computer, vol. 40, no. 2, pp. 107--109, 2007.

Digital Library

[49]

T. Banzai, H. Koizumi, R. Kanbayashi, T. Imada, T. Hanawa, and M. Sato, "D-Cloud: Design of a Software Testing Environment for Reliable Distributed Systems Using Cloud Computing Technology," in Proc. CCGRID'10, 2010, pp. 631--636.

Digital Library

[50]

T. Hanawa, T. Banzai, H. Koizumi, R. Kanbayashi, T. Imada, and M. Sato, "Large-Scale Software Testing Environment Using Cloud Computing Technology for Dependable Parallel and Distributed Systems," in Proc. ICSTW'10, Apr. 2010, pp. 428--433.

Digital Library

[51]

R. Banabic and G. Candea, "Fast black-box testing of system recovery code," in Proc. EuroSys'12, 2012, pp. 281--294.

Digital Library

[52]

D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat, "Enforcing performance isolation across virtual machines in xen," in Proc. Middleware'06, 2006, pp. 342--362.

Digital Library

[53]

G. Somani and S. Chaudhary, "Application performance isolation in virtualization," in Proc. CLOUD'09, 2009, pp. 41--48.

Digital Library

[54]

Q. Huang and P. P. Lee, "An experimental study of cascading performance interference in a virtualized environment," ACM SIGMETRICS Performance Evaluation Review, vol. 40, no. 4, pp. 43--52, 2013.

Digital Library

[55]

D. Novaković, N. Vasić, S. Novaković, D. Kostić, and R. Bianchini, "DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments," in Proc. USENIX ATC'13, 2013, pp. 219--230.

Digital Library

[56]

A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An Empirical Study of Operating Systems Errors," in Proc. SOSP'01, 2001, pp. 73--88.

Digital Library

[57]

N. Palix, G. Thomas, S. Saha, C. Calvès, J. Lawall, and G. Muller, "Faults in linux: ten years later," in Proc. ASPLOS'11, 2011, pp. 305--318.

Digital Library

[58]

D. Simpson, Windows XP Embedded with Service Pack 1 Reliability. {Online}. Available: http://msdn.microsoft.com/en-us/library/ms838661(WinEmbedded.5).aspx.

[59]

A. Ganapathi, V. Ganapathi, and D. Patterson, "Windows XP Kernel Crash Analysis," in Proc. LISA'06, 2006, pp. 12--22.

Digital Library

[60]

D. Cotroneo, R. Natella, and S. Russo, "Assessment and Improvement of Hang Detection in the Linux Operating System," in Proc. SRDS'09, Sep. 2009, pp. 288--294.

Digital Library

[61]

A. Bovenzi, M. Cinque, D. Cotroneo, R. Natella, and G. Carrozza, "OS-Level Hang Detection in Complex Software Systems," Int. J. Crit. Comput.-Based Syst., vol. 2, no. 3/4, pp. 352--377, Sep. 2011.

Digital Library

[62]

Y. Zhu, Y. Li, J. Xue, T. Tan, J. Shi, Y. Shen, and C. Ma, "What Is System Hang and How to Handle It," in Proc. ISSRE'12, 2012, pp. 141--150.

Digital Library

[63]

DEEDS/TUD and Mobilab/UniNa, PAIN Software Framework, https://github.com/DEEDS-TUD/PAIN.git.

[64]

J. Christmansson and R. Chillarege, "Generation of an Error Set that Emulates Software Faults based on Field Data," in FTCS, 1996, pp. 304--313.

Digital Library

[65]

Google Inc., Android -- Discover Android. {Online}. Available: http://www.android.com/about/.

[66]

Google Inc., android Git repositories. {Online}. Available: https://android.googlesource.com/.

[67]

Google Inc., Android Emulator. {Online}. Available: http://developer.android.com/tools/help/emulator.html.

[68]

F. Bellard, Qemu. {Online}. Available: http://wiki.qemu.org/Main_Page.

[69]

R. Longbottom, Roy Longbottom's Android Benchmark Apps. {Online}. Available: http://www.roylongbottom.org.uk/android%20benchmarks.htm.

[70]

Y. Benjamini and Y. Hochberg, "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing," Journal of the Royal Statistical Society. Series B (Methodological), vol. 57, no. 1, pp. 289--300, 1995.

[71]

J. Andrews, L. Briand, and Y. Labiche, "Is mutation an appropriate tool for testing experiments?" In Proc. ICSE'05, 2005, pp. 402--411.

Digital Library

[72]

H. Do and G. Rothermel, "On the use of mutation faults in empirical assessments of test case prioritization techniques," IEEE Trans. Softw. Eng., pp. 733--752, 2006.

Digital Library

[73]

J. Durães and H. Madeira, "Emulation of Software faults: A Field Data Study and a Practical Approach," IEEE Trans. Softw. Eng., vol. 32, no. 11, pp. 849--867, 2006.

Digital Library

[74]

R. Natella, D. Cotroneo, J. A. Durães, and H. S. Madeira, "On Fault Representativeness of Software Fault Injection," IEEE Trans. Softw. Eng., vol. 39, no. 1, pp. 80--96, 2013.

Digital Library

[75]

K. Kanoun and L. Spainhower, Dependability Benchmarking for Computer Systems. Wiley-IEEE Computer Society, 2008.

Digital Library

[76]

T. Tsai, M. Hsueh, H. Zhao, Z. Kalbarczyk, and R. Iyer, "Stress-based and path-based fault injection," IEEE Trans. on Computers, vol. 48, no. 11, pp. 1183--1201, 1999.

Digital Library

[77]

A. Avizienis, J. Laprie, B. Randell, and C. Landwehr, "Basic Concepts and Taxonomy of Dependable and Secure Computing," IEEE Trans. Dependable Secure Comput., vol. 1, no. 1, pp. 11--33, 2004.

Digital Library

Cited By

Reyes RDieste OFonseca EJuristo NChaudron MCrnkovic IChechik MHarman M(2018)Statistical errors in software engineering experimentsProceedings of the 40th International Conference on Software Engineering10.1145/3180155.3180161(1195-1206)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3180155.3180161
Coppik NSchwahn OWinter SSuri NRosu GDi Penta MNguyen T(2017)TrEKer: tracing error propagation in operating system kernelsProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155612(377-387)Online publication date: 30-Oct-2017
https://dl.acm.org/doi/10.5555/3155562.3155612
Guan QBeBardeleben NWu PEidenbenz SBlanchard SMonroe LBaseman ETan LTan G(2016)Design, Use and Evaluation of P-FSEFIProceedings of the 9th EAI International Conference on Simulation Tools and Techniques10.5555/3021426.3021429(9-17)Online publication date: 22-Aug-2016
https://dl.acm.org/doi/10.5555/3021426.3021429
Show More Cited By

Index Terms

No PAIN, no gain?: the utility of PArallel fault INjections

Recommendations

Using differences among replications of software engineering experiments to gain knowledge
ESEM '09: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement

In no science or engineering discipline does it make sense to speak of isolated experiments. The results of a single experiment cannot be viewed as representative of the underlying reality. The concept of experiment is closely related to replication. ...
Co-evolving tracing and fault injection with box of pain
HotCloud'19: Proceedings of the 11th USENIX Conference on Hot Topics in Cloud Computing

Distributed systems are hard to reason about largely because of uncertainty about what may go wrong in a particular execution, and about whether the system will mitigate those faults. Tools that perturb executions can help test whether a system is ...
Low-pain, high-gain multicore programming in Haskell: coordinating irregular symbolic computations on multicore architectures (abstract only)

With the emergence of commodity multicore architectures, exploiting tightly-coupled parallelism has become increasingly important. Functional programming languages, such as Haskell, are, in principle, well placed to take advantage of this trend, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '15: Proceedings of the 37th International Conference on Software Engineering - Volume 1

May 2015

999 pages

ISBN:9781479919345

General Chair:
Antonia Bertolino
ISTI-CNR, Italy
,
Program Chairs:
Gerardo Canfora
University of Sannio, Italy
,
Sebastian Elbaum
University of Nebraska-Lincoln

Sponsors

ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS\DATC: IEEE Computer Society
TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

IEEE Press

Publication History

Published: 16 May 2015

Check for updates

Qualifiers

Research-article

Conference

ICSE '15

Sponsor:

ACM
SIGSOFT
IEEE-CS\DATC
TCSE

ICSE '15: 37th International Conference on Software Engineering

May 16 - 24, 2015

Florence, Italy

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
151
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Reyes RDieste OFonseca EJuristo NChaudron MCrnkovic IChechik MHarman M(2018)Statistical errors in software engineering experimentsProceedings of the 40th International Conference on Software Engineering10.1145/3180155.3180161(1195-1206)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3180155.3180161
Coppik NSchwahn OWinter SSuri NRosu GDi Penta MNguyen T(2017)TrEKer: tracing error propagation in operating system kernelsProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155612(377-387)Online publication date: 30-Oct-2017
https://dl.acm.org/doi/10.5555/3155562.3155612
Guan QBeBardeleben NWu PEidenbenz SBlanchard SMonroe LBaseman ETan LTan G(2016)Design, Use and Evaluation of P-FSEFIProceedings of the 9th EAI International Conference on Simulation Tools and Techniques10.5555/3021426.3021429(9-17)Online publication date: 22-Aug-2016
https://dl.acm.org/doi/10.5555/3021426.3021429
Winter SPiper TSchwahn ONatella RSuri NCotroneo DHao DSubramanyan RMariani L(2015)GRINDERProceedings of the 10th International Workshop on Automation of Software Test10.5555/2819261.2819285(75-79)Online publication date: 16-May-2015
https://dl.acm.org/doi/10.5555/2819261.2819285

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten