skip to main content
10.1145/3377811.3381749acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

A study on the lifecycle of flaky tests

Published: 01 October 2020 Publication History

Abstract

During regression testing, developers rely on the pass or fail outcomes of tests to check whether changes broke existing functionality. Thus, flaky tests, which nondeterministically pass or fail on the same code, are problematic because they provide misleading signals during regression testing. Although flaky tests are the focus of several existing studies, none of them study (1) the reoccurrence, runtimes, and time-before-fix of flaky tests, and (2) flaky tests in-depth on proprietary projects.
This paper fills this knowledge gap about flaky tests and investigates whether prior categorization work on flaky tests also apply to proprietary projects. Specifically, we study the lifecycle of flaky tests in six large-scale proprietary projects at Microsoft. We find, as in prior work, that asynchronous calls are the leading cause of flaky tests in these Microsoft projects. Therefore, we propose the first automated solution, called Flakiness and Time Balancer (FaTB), to reduce the frequency of flaky-test failures caused by asynchronous calls. Our evaluation of five such flaky tests shows that FaTB can reduce the running times of these tests by up to 78% without empirically affecting the frequency of their flaky-test failures. Lastly, our study finds several cases where developers claim they "fixed" a flaky test but our empirical experiments show that their changes do not fix or reduce these tests' frequency of flaky-test failures. Future studies should be more cautious when basing their results on changes that developers claim to be "fixes".

References

[1]
2020. Azure Data Explorer. https://docs.microsoft.com/en-us/azure/data-explorer.
[2]
2020. Bazel. https://bazel.build.
[3]
2020. Buck. https://buckbuild.com.
[4]
2020. Data used by "A Study on the Lifecycle of Flaky Tests". https://github.com/winglam/flaky-test-lifecycle-data.
[5]
Jonathan Bell and Gail Kaiser. 2014. Unit test virtualization with VMVM. In ICSE. Hyderabad, India, 550--561.
[6]
Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, and Darko Marinov. 2018. DEFLAKER: Automatically detecting flaky tests. In ICSE. Gothenburg, Sweden, 433--444.
[7]
Moritz Eck, Fabio Palomba, Marco Castelluccio, and Alberto Bacchelli. 2019. Understanding flaky tests: The developer's perspective. In ESEC/FSE. Tallinn, Estonia, 830--840.
[8]
Tayfun Elmas, Jacob Burnim, George Necula, and Koushik Sen. 2013. CONCUR-RIT: A domain specific language for reproducing concurrency bugs. In PLDI. Seattle, WA, USA, 153--164.
[9]
Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft's distributed and caching build service. In ICSE. Austin, TX, USA, 11--20.
[10]
Martin Fowler. 2020. Eradicating non-determinism in tests. https://martinfowler.com/articles/nonDeterminism.html.
[11]
Alessio Gambi, Jonathan Bell, and Andreas Zeller. 2018. Practical test dependency detection. In ICST. Västerås, Sweden, 1--11.
[12]
Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. 2015. Making system user interactive tests repeatable: When and what should we control?. In ICSE. Florence, Italy, 55--65.
[13]
Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: Detecting state-polluting tests to prevent test dependency. In ISSTA. Baltimore, MD, USA, 223--233.
[14]
Mark Harman and Peter O'Hearn. 2018. From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis. In SCAM. Madrid, Spain, 1--23.
[15]
Kim Herzig, Michaela Greiler, Jacek Czerwonka, and Brendan Murphy. 2015. The art of testing less without sacrificing quality. In ICSE. Florence, Italy, 483--493.
[16]
Vilas Jagannath, Milos Gligoric, Dongyun Jin, Qingzhou Luo, Grigore Roşu, and Darko Marinov. 2011. Improved multithreaded unit testing. In ESEC/FSE. Szeged, Hungary, 223--233.
[17]
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the cost of regression testing in practice: A study of Java projects using continuous integration. In ESEC/FSE. Paderborn, Germany, 821--830.
[18]
Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In ISSTA. Beijing, China, 101--111.
[19]
Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A framework for detecting and partially classifying flaky tests. In ICST. Xi'an, China, 312--322.
[20]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In FSE. Hong Kong, 643--653.
[21]
Qingzhou Luo and Grigore Roşu. 2013. EnforceMOP: A runtime property enforcement system for multithreaded programs. In ISSTA. Lugano, Switzerland, 156--166.
[22]
Eduardo R. B. Marques, Francisco Martins, and Miguel Simões. 2014. Cooperari: A tool for cooperative testing of multithreaded Java programs. In PPPJ. Cracow, Poland, 200--206.
[23]
John Micco. 2017. The state of continuous integration testing at Google. In ICST. Tokyo, Japan. https://bit.ly/2OohAip
[24]
Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding bugs by isolating unit tests. In ESEC/FSE. Szeged, Hungary, 496--499.
[25]
Fabio Palomba and Andy Zaidman. 2017. Does refactoring of test smells induce fixing flaky tests?. In ICSME. Shanghai, China, 1--12.
[26]
Md Tajmilur Rahman and Peter C. Rigby. 2018. The impact of failing, flaky, and high failure tests on the number of crash reports associated with Firefox builds. In ESEC/FSE. Lake Buena Vista, FL, USA, 857--862.
[27]
August Shi, Alex Gyori, Owolabi Legunsen, and Darko Marinov. 2016. Detecting assumptions on deterministic implementations of non-deterministic specifications. In ICST. Chicago, IL, USA, 80--90.
[28]
August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In ESEC/FSE. Tallinn, Estonia, 545--555.
[29]
Swapna Thorve, Chandani Sreshtha, and Na Meng. 2018. An empirical study of flaky tests in Android apps. In ICSME, NIER Track. Madrid, Spain, 534--538.
[30]
Kaiyuan Wang, Sarfraz Khurshid, and Milos Gligoric. 2018. JPR: Replaying JPF traces using standard JVM. SIGSOFT Software Engineering Notes 42 (2018), 1--5.
[31]
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In ISSTA. San Jose, CA, USA, 385--396.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
June 2020
1640 pages
ISBN:9781450371216
DOI:10.1145/3377811
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • KIISE: Korean Institute of Information Scientists and Engineers
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. empirical study
  2. flaky test
  3. lifecycle

Qualifiers

  • Research-article

Conference

ICSE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)12
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media