skip to main content
10.1145/2635868.2635920acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

An empirical analysis of flaky tests

Published: 11 November 2014 Publication History

Abstract

Regression testing is a crucial part of software development. It checks that software changes do not break existing functionality. An important assumption of regression testing is that test outcomes are deterministic: an unmodified test is expected to either always pass or always fail for the same code under test. Unfortunately, in practice, some tests often called flaky tests—have non-deterministic outcomes. Such tests undermine the regression testing as they make it difficult to rely on test results. We present the first extensive study of flaky tests. We study in detail a total of 201 commits that likely fix flaky tests in 51 open-source projects. We classify the most common root causes of flaky tests, identify approaches that could manifest flaky behavior, and describe common strategies that developers use to fix flaky tests. We believe that our insights and implications can help guide future research on the important topic of (avoiding) flaky tests.

References

[1]
API design wiki - OrderOfElements. http://wiki.apidesign.org/wiki/OrderOfElements.
[2]
Android FlakyTest annotation. http://goo.gl/e8PILv.
[3]
Apache Software Foundation SVN Repository. http://svn.apache.org/repos/asf/.
[4]
Apache Software Foundation. HBASE-2684. https://issues.apache.org/jira/browse/HBASE-2684.
[5]
A. Bachmann, C. Bird, F. Rahman, P. T. Devanbu, and A. Bernstein. The missing links: bugs and bug-fix commits. In FSE, 2010.
[6]
E. T. Barr, T. Vo, V. Le, and Z. Su. Automatic detection of floating-point exceptions. In POPL, 2013.
[7]
J. Bell and G. Kaiser. Unit test virtualization with VMVM. In ICSE, 2014.
[8]
N. Bettenburg, W. Shang, W. M. Ibrahim, B. Adams, Y. Zou, and A. E. Hassan. An empirical study on inconsistent changes to code clones at the release level. SCP, 2012.
[9]
B. Daniel, V. Jagannath, D. Dig, and D. Marinov. ReAssert: Suggesting repairs for broken unit tests. In ASE, 2009.
[10]
O. Edelstein, E. Farchi, E. Goldin, Y. Nir, G. Ratsaby, and S. Ur. Framework for Testing Multi-Threaded Java Programs. CCPE, 2003.
[11]
E. Farchi, Y. Nir, and S. Ur. Concurrent bug patterns and how to test them. In IPDPS, 2003.
[12]
Flakiness dashboard HOWTO. http://goo.gl/JRZ1J8.
[13]
M. Fowler. Eradicating non-determinism in tests. http://goo.gl/cDDGmm.
[14]
P. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows. In ICSE, 2010.
[15]
P. Gupta, M. Ivey, and J. Penix. Testing at the speed and scale of Google, 2011. http://goo.gl/2B5cyl.
[16]
V. Jagannath, M. Gligoric, D. Jin, Q. Luo, G. Rosu, and D. Marinov. Improved multithreaded unit testing. In FSE, 2011.
[17]
Jenkins RandomFail annotation. http://goo.gl/tzyC0W.
[18]
G. Jin, L. Song, W. Zhang, S. Lu, and B. Liblit. Automated atomicity-violation fixing. In PLDI, 2011.
[19]
JUnit and Java7. http://goo.gl/g4crZL.
[20]
F. Lacoste. Killing the gatekeeper: Introducing a continuous integration system. In Agile, 2009.
[21]
T. Lavers and L. Peters. Swing Extreme Testing. 2008.
[22]
Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things changed now?: An empirical study of bug characteristics in modern open source software. In ASID, 2006.
[23]
Y. Lin and D. Dig. CHECK-THEN-ACT misuse of Java concurrent collections. In ICST, 2013.
[24]
S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In ASPLOS, 2008.
[25]
P. Marinescu, P. Hosek, and C. Cadar. Covrig: A framework for the analysis of code, test, and coverage evolution in real software. In ISSTA, 2014.
[26]
A. M. Memon and M. B. Cohen. Automated testing of GUI applications: models, tools, and controlling flakiness. In ICSE, 2013.
[27]
J. Micco. Continuous integration at Google scale, 2013. http://goo.gl/0qnzGj.
[28]
B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of Unix utilities. CACM, 1990.
[29]
K. Mu¸slu, B. Soran, and J. Wuttke. Finding bugs by isolating unit tests. ESEC/FSE, 2011.
[30]
E. R. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan. The design of bug fixes. In ICSE, 2013.
[31]
Spring Repeat Annotation. http://goo.gl/vnfU3Y.
[32]
P. Sudarshan. No more flaky tests on the Go team. http://goo.gl/BiWaE1.
[33]
6 tips for writing robust, maintainable unit tests. http://blog.melski.net/tag/unit-tests.
[34]
TotT: Avoiding flakey tests. http://goo.gl/vHE47r.
[35]
R. Tzoref, S. Ur, and E. Yom-Tov. Instrumenting where it hurts: An automatic concurrent debugging technique. In ISSTA, 2007.
[36]
G. Yang, S. Khurshid, and M. Kim. Specification-based test repair using a lightweight formal method. In FM, 2012.
[37]
S. Zhang, D. Jalali, J. Wuttke, K. Muslu, M. Ernst, and D. Notkin. Empirically revisiting the test independence assumption. In ISSTA, 2014. Introduction Methodology Causes of Flakiness Categories of Flakiness Root Causes Async Wait Concurrency Test Order Dependency Other Root Causes Flaky Test Introduction Manifestation Platform (In)dependency Flakiness Manifestation Strategies Async Wait Concurrency Test Order Dependency Fixing Strategies Common Fixes and Effectiveness Async Wait Concurrency Test Order Dependency Others Changes to the Code under Test Threats to Validity Related Work Conclusions Acknowledgments References

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering
November 2014
856 pages
ISBN:9781450330565
DOI:10.1145/2635868
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Empirical study
  2. flaky tests
  3. non-determinism

Qualifiers

  • Research-article

Conference

SIGSOFT/FSE'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)299
  • Downloads (Last 6 weeks)34
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media