research-article

Do Automatically Generated Test Cases Make Debugging Easier? An Experimental Assessment of Debugging Effectiveness and Efficiency

Authors:

Mariano Ceccato,

Alessandro Marchetto,

Leonardo Mariani,

Paolo TonellaAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 25, Issue 1

Article No.: 5, Pages 1 - 38

https://doi.org/10.1145/2768829

Published: 02 December 2015 Publication History

Abstract

Several techniques and tools have been proposed for the automatic generation of test cases. Usually, these tools are evaluated in terms of fault-revealing or coverage capability, but their impact on the manual debugging activity is not considered. The question is whether automatically generated test cases are equally effective in supporting debugging as manually written tests.

We conducted a family of three experiments (five replications) with humans (in total, 55 subjects) to assess whether the features of automatically generated test cases, which make them less readable and understandable (e.g., unclear test scenarios, meaningless identifiers), have an impact on the effectiveness and efficiency of debugging. The first two experiments compare different test case generation tools (Randoop vs. EvoSuite). The third experiment investigates the role of code identifiers in test cases (obfuscated vs. original identifiers), since a major difference between manual and automatically generated test cases is that the latter contain meaningless (obfuscated) identifiers.

We show that automatically generated test cases are as useful for debugging as manual test cases. Furthermore, we find that, for less experienced developers, automatic tests are more useful on average due to their lower static and dynamic complexity.

Supplementary Material

a5-ceccato-apndx.pdf (ceccato.zip)

Supplemental movie, appendix, image and software files for, Do Automatically Generated Test Cases Make Debugging Easier? An Experimental Assessment of Debugging Effectiveness and Efficiency

Download
336.09 KB

References

[1]

J. H. Andrews, L. C. Briand, and Y. Labiche. 2005. Is mutation an appropriate tool for testing experiments? In Proceedings of the 27^th International Conference on Software Engineering (ICSE'05). 402--411.

Digital Library

[2]

J. H. Andrews, A. Groce, M. Weston, and R.-G. Xu. 2008. Random test run length and effectiveness. In Proceedings of the 23^rd IEEE/ACM International Conference on Automated Software Engineering (ASE'08). IEEE Computer Society, 19--28.

Digital Library

[3]

S. Artzi, J. Dolby, F. Tip, and M. Pistoia. 2010. Directed test generation for effective fault localization. In Proceedings of the 19^th International Symposium on Software Testing and Analysis (ISSTA'10). ACM Press, New York, 49--60.

Digital Library

[4]

N. E. Beckman, A. V. Nori, S. K. Rajamani, R. J. Simmons, S. D. Tetali, and A. V. Thakur. 2010. Proofs from tests. IEEE Trans. Softw. Engin. 36, 495--508.

Digital Library

[5]

J. Burnim and K. Sen. 2008. Heuristics for scalable dynamic test generation. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE'08). 443--446.

Digital Library

[6]

C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. 2008. EXE: Automatically generating inputs of death. ACM Trans. Inf. Syst. Secur. 12, 2.

Digital Library

[7]

M. Ceccato, A. Marchetto, L. Mariani, C. D. Nguyen, and P. Tonella. 2012. An empirical study about the effectiveness of debugging when random test cases are used. In Proceedings of the 34^th International Conference on Software Engineering (ICSE'12). 452--462.

Digital Library

[8]

M. Ceccato, C. D. Nguyen, A. Marchetto, L. Mariani, and P. Tonella. 2013. A family of experiments to assess the impact of automated test case generation on the accuracy and efficiency of debugging, data analysis of five replications. Tech. rep. FBK, TR-FBK-SE-2013-2. https://se.fbk.eu/technical-reports.

[9]

I. Ciupa, A. Leitner, M. Oriol, and B. Meyer. 2007. Experimental assessment of random testing for object-oriented software. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA'07). ACM Press, New York, 84--94.

Digital Library

[10]

T. M. Cover and P. E. Hart. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 1, 21--27.

Digital Library

[11]

H. Do, S. G. Elbaum, and G. Rothermel, G. 2005. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir. Softw. Engin. 10, 4, 405--435.

Digital Library

[12]

J. W. Duran. 1984. An evaluation of random testing. IEEE Trans. Softw. Engin. 4, 438--444.

Digital Library

[13]

P. G. Frankl and S. N. Weiss. 1991. An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria. In Proceedings of the Symposium on Testing, Analysis, and Verification (TAV4'91). ACM Press, New York, 154--164.

Digital Library

[14]

G. Fraser and A. Arcuri. 2011. Evolutionary generation of whole test suites. In Proceedings of the 11^th International Conference on Quality Software (QSIC'11). 31--40.

Digital Library

[15]

G. Fraser and A. Zeller. 2010. Mutation-driven generation of unit tests and oracles. In Proceedings of the 19^th International Symposium on Software Testing and Analysis (ISSTA'10). ACM Press, New York, 147--158.

Digital Library

[16]

Z. P. Fry and W. Weimer. 2010. A human study of fault localization accuracy. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM'10). IEEE Computer Society, 1--10.

Digital Library

[17]

P. Godefroid, N. Klarlund, and K. Sen. 2005. DART: Directed automated random testing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). ACM Press, New York, 213--223.

Digital Library

[18]

P. Godefroid, M. Y. Levin, and D. A. Molnar. 2008. Automated whitebox fuzz testing. In Proceedings of the Network and Distributed System Security Symposium (NDSS'08).

[19]

S. Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinav. J. Statist. 6, 2, 65--70.

[20]

L. Huang and M. Holcombe. 2009. Empirical investigation towards the effectiveness of test first programming. Inf. Softw. Technol. 51, 1, 182--194.

Digital Library

[21]

J. Itkonen, M. V. Mantyla, and C. Lassenius. 2009. How do testers do it? An exploratory study on manual testing practices. In Proceedings of the 3^rd International Symposium on Empirical Software Engineering and Measurement (ESEM'09). IEEE Computer Society, 494--497.

Digital Library

[22]

J. A. Jones, M. J. Harrold, and J. Stasko. 2002. Visualization of test information to assist fault localization. In Proceedings of the International Conference on Software Engineering (ICSE'02). ACM Press, New York, 467--477.

Digital Library

[23]

A. N. Oppenheim. 1992. Questionnaire Design, Interviewing and Attitude Measurement. Pinter, London.

[24]

C. Pacheco and M. D. Ernst. 2007. Randoop: Feedback-directed random testing for Java. In Companion to the 22^nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion (OOPSLA'07). ACM Press, New York, 815--816.

Digital Library

[25]

C. Parnin and A. Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA'11). ACM Press, New York, 199--209.

Digital Library

[26]

F. Ricca, M. Torchiano, M. Di Penta, M. Ceccato, and P. Tonella. 2009. Using acceptance tests as a support for clarifying requirements: A series of experiments. Inf. Softw. Technol. 51, 2, 270--283.

Digital Library

[27]

J. R. Ruthruff, M. Burnett, and G. Rothermel. 2005. An empirical study of fault localization for end-user programmers. In Proceedings of the 27^th International Conference on Software Engineering (ICSE'05). ACM Press, New York, 352--361.

Digital Library

[28]

K. Sen, D. Marinov, and G. Agha. 2005. CUTE: A concolic unit testing engine for C. SIGSOFT Softw. Engin. Notes 30, 5, 263--272.

Digital Library

[29]

N. Tillmann and J. De Halleux. 2008. Pex: White box test generation for .NET. In Proceedings of the 2^nd International Conference on Tests and Proofs (TAP'08). Springer, 134--153.

Digital Library

[30]

P. Tonella. 2004. Evolutionary testing of classes. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'04). 119--128.

Digital Library

[31]

C. J. Van Rijsbergen. 1979. Information Retrieval, 2^nd ed. Butterworths, London.

Digital Library

[32]

M. Weiser and J. Lyle. 1986. Experiments on slicing-based debugging aids. In Proceedings of the 1^st Workshop on Empirical Studies of Programmers. Ablex Publishing, Norwood, NJ, 187--197.

Digital Library

[33]

C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. 2012. Experimentation in Software Engineering. Springer.

Digital Library

[34]

Y. Yu, J. A. Jones, and M. J. Harrold. 2008. An empirical study of the effects of test-suite reduction on fault localization. In Proceedings of the 30^th International Conference on Software Engineering (ICSE'08). ACM Press, New York, 201--210.

Digital Library

[35]

A. Zeller and R. Hildebrandt. 2002. Simplifying and isolating failure-inducing input. IEEE Trans. Softw. Engin. 28, 2, 183--200.

Digital Library

Cited By

Xiao YZuo XLu XDong JCao XBeschastnikh I(2025)Promises and perils of using Transformer-based models for SE researchNeural Networks10.1016/j.neunet.2024.107067184(107067)Online publication date: Apr-2025
https://doi.org/10.1016/j.neunet.2024.107067
Kolchyn OPotiyenko S(2024)An Automated Method for Checking and Debugging Test Scenarios Based on Formal ModelsControl Systems and Computers10.15407/csc.2024.03.033(33-44)Online publication date: Oct-2024
https://doi.org/10.15407/csc.2024.03.033
Metzger ALaufer JFeit FPohl K(2024)A User Study on Explainable Online Reinforcement Learning for Adaptive SystemsACM Transactions on Autonomous and Adaptive Systems10.1145/366600519:3(1-44)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3666005
Show More Cited By

Recommendations

Improving The Effectiveness of Automatically Generated Test Suites Using Metamorphic Testing
ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops

Automated test generation has helped to reduce the cost of software testing. However, developing effective test oracles for these automatically generated test inputs is a challenging task. Therefore, most automated test generation tools use trivial ...
Hypothesizer: A Hypothesis-Based Debugger to Find and Test Debugging Hypotheses
UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

When software defects occur, developers begin the debugging process by formulating hypotheses to explain the cause. These hypotheses guide the investigation process, determining which evidence developers gather to accept or reject the hypothesis, such ...
Convergence debugging
AADEBUG'05: Proceedings of the sixth international symposium on Automated analysis-driven debugging

This paper proposes a new practical automatic debugging method, called Convergence Debugging, which isolates a set of test cases that converge on the internal root cause of a failure. This method involves evaluating the debugging effectiveness of a set ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 25, Issue 1

December 2015

339 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/2852270

Editor:
David S. Rosenblum
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2015

Accepted: 01 April 2015

Revised: 01 November 2014

Received: 01 May 2014

Published in TOSEM Volume 25, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
748
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)6

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiao YZuo XLu XDong JCao XBeschastnikh I(2025)Promises and perils of using Transformer-based models for SE researchNeural Networks10.1016/j.neunet.2024.107067184(107067)Online publication date: Apr-2025
https://doi.org/10.1016/j.neunet.2024.107067
Kolchyn OPotiyenko S(2024)An Automated Method for Checking and Debugging Test Scenarios Based on Formal ModelsControl Systems and Computers10.15407/csc.2024.03.033(33-44)Online publication date: Oct-2024
https://doi.org/10.15407/csc.2024.03.033
Metzger ALaufer JFeit FPohl K(2024)A User Study on Explainable Online Reinforcement Learning for Adaptive SystemsACM Transactions on Autonomous and Adaptive Systems10.1145/366600519:3(1-44)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3666005
Sapozhnikov AOlsthoorn MPanichella AKovalenko VDerakhshanfar PRoychoudhury APaiva AAbreu RStorey M(2024)TestSpark: IntelliJ IDEA's Ultimate Test Generation CompanionProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3640024(30-34)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3640024
Pecorelli FGrano GPalomba FGall HDe Lucia A(2024)Toward granular search-based automatic unit test case generationEmpirical Software Engineering10.1007/s10664-024-10451-x29:4Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1007/s10664-024-10451-x
Winkler DUrbanke PRamler R(2024)Investigating the readability of test codeEmpirical Software Engineering10.1007/s10664-023-10390-z29:2Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1007/s10664-023-10390-z
Kolchyn OPotiyenko S(2024)Improving Model-Based Testing Through Interactive Validation, Evaluation and Reconstruction of Test CasesQuality of Information and Communications Technology10.1007/978-3-031-70245-7_11(150-158)Online publication date: 11-Sep-2024
https://doi.org/10.1007/978-3-031-70245-7_11
Alyabroodi ZAbuasal SAlamareen AAl-mashagbeh MAngawi M(2024)A Comparison Study for Test Case Management ToolsArtificial Intelligence and Economic Sustainability in the Era of Industrial Revolution 5.010.1007/978-3-031-56586-1_41(563-570)Online publication date: 29-May-2024
https://doi.org/10.1007/978-3-031-56586-1_41
Marinho EFerreira FDiniz JFigueiredo E(2023)Applying Spectrum-Based Fault Localization to Android ApplicationsProceedings of the XXXVII Brazilian Symposium on Software Engineering10.1145/3613372.3613397(257-266)Online publication date: 25-Sep-2023
https://dl.acm.org/doi/10.1145/3613372.3613397
Derakhshanfar PDevroey XPanichella AZaidman Avan Deursen A(2023)Generating Class-Level Integration Tests Using Call Site InformationIEEE Transactions on Software Engineering10.1109/TSE.2022.320962549:4(2069-2087)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TSE.2022.3209625
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents