research-article

Can testedness be effectively measured?

Authors:

Iftekhar Ahmed,

Rahul Gopinath,

Caius Brindescu,

Carlos JensenAuthors Info & Claims

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Pages 547 - 558

https://doi.org/10.1145/2950290.2950324

Published: 01 November 2016 Publication History

Abstract

Among the major questions that a practicing tester faces are deciding where to focus additional testing effort, and deciding when to stop testing. Test the least-tested code, and stop when all code is well-tested, is a reasonable answer. Many measures of "testedness" have been proposed; unfortunately, we do not know whether these are truly effective. In this paper we propose a novel evaluation of two of the most important and widely-used measures of test suite quality. The first measure is statement coverage, the simplest and best-known code coverage measure. The second measure is mutation score, a supposedly more powerful, though expensive, measure.

We evaluate these measures using the actual criteria of interest: if a program element is (by these measures) well tested at a given point in time, it should require fewer future bug-fixes than a "poorly tested" element. If not, then it seems likely that we are not effectively measuring testedness. Using a large number of open source Java programs from Github and Apache, we show that both statement coverage and mutation score have only a weak negative correlation with bug-fixes. Despite the lack of strong correlation, there are statistically and practically significant differences between program elements for various binary criteria. Program elements (other than classes) covered by any test case see about half as many bug-fixes as those not covered, and a similar line can be drawn for mutation score thresholds. Our results have important implications for both software engineering practice and research evaluation.

References

[1]

I. Ahmed, U. A. Mannan, R. Gopinath, and C. Jensen. An empirical study of design degradation: How software projects get worse over time. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pages 31–40, 2015.

[2]

J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In International Conference on Software Engineering, pages 402–411. IEEE, 2005.

Digital Library

[3]

J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering, 32(8), 2006.

Digital Library

[4]

Apache Software Foundation. Apache commons. http://commons.apache.org/.

[5]

Apache Software Foundation. Apache maven project. http://maven.apache.org.

[6]

A. Arcuri and L. C. Briand. A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test., Verif. Reliab., 24(3):219–250, 2014.

Digital Library

[7]

C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 121–130. ACM, 2009.

Digital Library

[8]

T. A. Budd. Mutation Analysis of Program Test Data. PhD thesis, Yale University, New Haven, CT, USA, 1980.

Digital Library

[9]

X. Cai and M. R. Lyu. The effect of code coverage on fault detection under different testing profiles. In ACM SIGSOFT Software Engineering Notes, volume 30, pages 1–7. ACM, 2005.

Digital Library

[10]

H. Coles. Pit mutation testing. http://pitest.org/.

[11]

M. Daran and P. Thévenod-Fosse. Software error analysis: A real case study involving real faults and mutations. In ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 158–171. ACM, 1996.

Digital Library

[12]

M. Delahaye and L. Bousquet. Selecting a software engineering tool: lessons learnt from mutation analysis. Software: Practice and Experience, 2015.

Digital Library

[13]

R. A. DeMillo and A. P. Mathur. On the use of software artifacts to evaluate the effectiveness of mutation analysis for detecting errors in production software. Technical Report SERC-TR92-P, Software Engineering Research Center, Purdue University, West Lafayette, IN ” 1991.

[14]

J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pages 313–324, New York, NY, USA, 2014. ACM.

Digital Library

[15]

P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In ACM SIGSOFT Software Engineering Notes, volume 23, pages 153–162. ACM, 1998.

Digital Library

[16]

P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Transactions on Software Engineering, 19:774–787, 1993.

Digital Library

[17]

P. G. Frankl, S. N. Weiss, and C. Hu. All-uses vs mutation testing: an experimental comparison of effectiveness. Journal of Systems and Software, 38(3):235–253, 1997.

Digital Library

[18]

GitHub Inc. Software repository. http://www.github.com.

[19]

M. Gligoric, A. Groce, C. Zhang, R. Sharma, A. Alipour, and D. Marinov. Guidelines for coverage-based comparisons of non-adequate test suites. ACM Transactions on Software Engineering and Methodology, 24(4):4–37, 2014.

Digital Library

[20]

M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 2013.

Digital Library

[21]

R. Gopinath, C. Jensen, and A. Groce. Code coverage for suite evaluation by developers. In International Conference on Software Engineering. IEEE, 2014.

Digital Library

[22]

A. Groce, M. A. Alipour, and R. Gopinath. Coverage and its discontents. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! 2014, pages 255–268, New York, NY, USA, 2014. ACM.

Digital Library

[23]

A. Gupta and P. Jalote. An approach for experimentally evaluating effectiveness and efficiency of coverage criteria for software testing. International Journal on Software Tools for Technology Transfer, 10(2):145–160, 2008.

[24]

M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow-and controlflow-based test adequacy criteria. In International Conference on Software Engineering, pages 191–200. IEEE Computer Society Press, 1994.

Digital Library

[25]

L. Inozemtseva and R. Holmes. Coverage Is Not Strongly Correlated With Test Suite Effectiveness. In International Conference on Software Engineering, 2014.

Digital Library

[26]

L. M. M. Inozemtseva. Predicting test suite effectiveness for java programs. Master’s thesis, University of Waterloo, 2012.

[27]

R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In ACM SIGSOFT Symposium on The Foundations of Software Engineering, pages 654–665, Hong Kong, China, 2014. ACM.

Digital Library

[28]

S. Kakarla. An analysis of parameters influencing test suite effectiveness. Master’s thesis, Texas Tech University, 2010.

[29]

N. Li, U. Praphamontripong, and J. Offutt. An experimental comparison of four unit test criteria: Mutation, edge-pair, all-uses and prime path coverage. In International Conference on Software Testing, Verification and Validation Workshops, pages 220–229. IEEE, 2009.

Digital Library

[30]

A. P. Mathur and W. E. Wong. An empirical comparison of data flow and mutation-based test adequacy criteria. Software Testing, Verification and Reliability, 4(1):9–31, 1994.

[31]

T. J. McCabe. A complexity measure. IEEE Transactions on Software Engineering, (4):308–320, 1976.

Digital Library

[32]

A. Mockus, N. Nagappan, and T. T. Dinh-Trong. Test coverage and post-verification defects: A multiple case study. In Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on, pages 291–301. IEEE, 2009.

Digital Library

[33]

A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 57–68. ACM, 2009.

Digital Library

[34]

A. S. Namin and S. Kakarla. The use of mutation in testing experiments and its sensitivity to external threats. In ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 342–352, New York, NY, USA, 2011. ACM.

Digital Library

[35]

A. J. Offutt and J. M. Voas. Subsumption of condition coverage techniques by mutation testing. Technical report, Technical Report ISSE-TR-96-01, Information and Software Systems Engineering, George Mason University, 1996.

[36]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825–2830, 2011.

Digital Library

[37]

RTCA Special Committee 167. Software considerations in airborne systems and equipment certification. Technical Report DO-1789B, RTCA, Inc., 1992.

[38]

S. Shamshiri, R. Just, J. M. Rojas, G. Fraser, P. McMinn, and A. Arcuri. Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges. In IEEE/ACM International Conference on Automated Software Engineering, pages 201–211, 2015.

Digital Library

[39]

M. Shepperd. A critique of cyclomatic complexity as a software metric. Software Engineering Journal, 3(2):30–36, 1988.

Digital Library

[40]

A. Shi, A. Gyori, M. Gligoric, A. Zaytsev, and D. Marinov. Balancing trade-offs in test-suite reduction. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 246–256, New York, NY, USA, 2014. ACM.

Digital Library

[41]

SIR: Software-artifact Infrastructure Repository. Sir usage information, accessed at mar 8, 2016. http://sir.unl.edu/portal/usage.php.

[42]

D. Tengeri, L. Vidacs, A. Beszedes, J. Jasz, G. Balogh, B. Vancsics, and T. Gyimóthy. Relating code coverage, mutation score and test suite reducibility to defect density,accepted paper. In mutationworkshop, 2016.

[43]

Y. Tian, J. Lawall, and D. Lo. Identifying linux bug fixing patches. In Software Engineering (ICSE), 2012 34th International Conference on, pages 386–396. IEEE, 2012.

Digital Library

[44]

Y. Wei, B. Meyer, and M. Oriol. Empirical Software Engineering and Verification, chapter Is branch coverage a good measure of testing effectiveness?, pages 194–212. Springer-Verlag, Berlin, Heidelberg, 2012.

Digital Library

Cited By

Zhang PWang YLiu XLu ZYang YLi YChen LWang ZSun CYu XZhou Y(2023)Assessing Effectiveness of Test Suites: What Do We Know and What Should We Do?ACM Transactions on Software Engineering and Methodology10.1145/363571333:4(1-32)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3635713
Reck JBach TStoess JChandra SBlincoe KTonella P(2023)A Multidimensional Analysis of Bug Density in SAP HANAProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613875(1997-2007)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3613875
Obaid Barraood SMohd HBaharom FAlmogahed A(2023)Systematic Literature Review on Test Case Quality Characteristics and Metrics2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA)10.1109/eSmarTA59349.2023.10293544(01-08)Online publication date: 10-Oct-2023
https://doi.org/10.1109/eSmarTA59349.2023.10293544
Show More Cited By

Index Terms

Can testedness be effectively measured?
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Empirical software validation
      2. Software defect analysis
        Software testing and debugging

Recommendations

The influence of size and coverage on test suite effectiveness
ISSTA '09: Proceedings of the eighteenth international symposium on Software testing and analysis

We study the relationship between three properties of test suites: size, structural coverage, and fault-finding effectiveness. In particular, we study the question of whether achieving high coverage leads directly to greater effectiveness, or only ...
PBCOV: a property-based coverage criterion

Coverage criteria aim at satisfying test requirements and compute metrics values that quantify the adequacy of test suites at revealing defects in programs. Typically, a test requirement is a structural program element, and the coverage metric value ...
An approach for experimentally evaluating effectiveness and efficiency of coverage criteria for software testing

Experimental work in software testing has generally focused on evaluating the effectiveness and effort requirements of various coverage criteria. The important issue of testing efficiency has not been sufficiently addressed. In this paper, we describe ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

November 2016

1156 pages

ISBN:9781450342186

DOI:10.1145/2950290

General Chair:
Thomas Zimmermann
Microsoft Research, USA
,
Program Chairs:
Jane Cleland-Huang
University of Notre Dame, USA
,
Zhendong Su
University of California at Davis, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FSE'16

Sponsor:

SIGSOFT

FSE'16: 24nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering

November 13 - 18, 2016

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
515
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang PWang YLiu XLu ZYang YLi YChen LWang ZSun CYu XZhou Y(2023)Assessing Effectiveness of Test Suites: What Do We Know and What Should We Do?ACM Transactions on Software Engineering and Methodology10.1145/363571333:4(1-32)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3635713
Reck JBach TStoess JChandra SBlincoe KTonella P(2023)A Multidimensional Analysis of Bug Density in SAP HANAProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613875(1997-2007)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3613875
Obaid Barraood SMohd HBaharom FAlmogahed A(2023)Systematic Literature Review on Test Case Quality Characteristics and Metrics2023 3rd International Conference on Emerging Smart Technologies and Applications (eSmarTA)10.1109/eSmarTA59349.2023.10293544(01-08)Online publication date: 10-Oct-2023
https://doi.org/10.1109/eSmarTA59349.2023.10293544
Saraiva DDa Costa DKulesza USizílio GNeto JCoelho RNagappan M(2023)Unveiling the Relationship Between Continuous Integration and Code Coverage2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00043(247-259)Online publication date: May-2023
https://doi.org/10.1109/MSR59073.2023.00043
Jain KKalburgi GLe Goues CGroce A(2023)Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00036(102-113)Online publication date: 9-Oct-2023
https://doi.org/10.1109/ISSRE59848.2023.00036
Barani MLabiche YRollet A(2023)On factors that impact the relationship between code coverage and test suite effectiveness: a survey2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW58534.2023.00071(381-388)Online publication date: Apr-2023
https://doi.org/10.1109/ICSTW58534.2023.00071
Zhang PLi YMa WYang YChen LLu HZhou YXu B(2022)CBUA: A Probabilistic, Predictive, and Practical Approach for Evaluating Test Suite EffectivenessIEEE Transactions on Software Engineering10.1109/TSE.2020.301036148:3(1067-1096)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TSE.2020.3010361
Romdhana ACeccato MMerlo ATonella P(2022)IFRIT: Focused Testing through Deep Reinforcement Learning2022 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST53961.2022.00013(24-34)Online publication date: Apr-2022
https://doi.org/10.1109/ICST53961.2022.00013
Barraood SMohd HBaharom F(2022)An initial investigation of the effect of quality factors on Agile test case quality through experts’ reviewCogent Engineering10.1080/23311916.2022.20821219:1Online publication date: 6-Jun-2022
https://doi.org/10.1080/23311916.2022.2082121
Byun TRayadurgam SHeimdahl M(2021)Black-Box Testing of Deep Neural Networks2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE52982.2021.00041(309-320)Online publication date: Oct-2021
https://doi.org/10.1109/ISSRE52982.2021.00041
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten