research-article

A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization

Authors:

Xiaoyuan Xie,

Tsong Yueh Chen,

Fei-Ching Kuo,

Baowen XuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 22, Issue 4

Article No.: 31, Pages 1 - 40

https://doi.org/10.1145/2522920.2522924

Published: 22 October 2013 Publication History

Get Access

Abstract

An important research area of Spectrum-Based Fault Localization (SBFL) is the effectiveness of risk evaluation formulas. Most previous studies have adopted an empirical approach, which can hardly be considered as sufficiently comprehensive because of the huge number of combinations of various factors in SBFL. Though some studies aimed at overcoming the limitations of the empirical approach, none of them has provided a completely satisfactory solution. Therefore, we provide a theoretical investigation on the effectiveness of risk evaluation formulas. We define two types of relations between formulas, namely, equivalent and better. To identify the relations between formulas, we develop an innovative framework for the theoretical investigation. Our framework is based on the concept that the determinant for the effectiveness of a formula is the number of statements with risk values higher than the risk value of the faulty statement. We group all program statements into three disjoint sets with risk values higher than, equal to, and lower than the risk value of the faulty statement, respectively. For different formulas, the sizes of their sets are compared using the notion of subset. We use this framework to identify the maximal formulas which should be the only formulas to be used in SBFL.

References

[1]

Abreu, R., Zoeteweij, P., Golsteijn, R., and Van Gemund, A. J. C. 2009. A practical evaluation of spectrum-based fault localization. J. Syst. Softw. 82, 11, 1780--1792.

Digital Library

Google Scholar

[2]

Abreu, R., Zoeteweij, P., and Van Gemund, A. J. C. 2006. An evaluation of similarity coefficients for software fault localization. In Proceedings of the 12^th Pacific Rim International Symposium on Dependable Computing. 39--46.

Digital Library

Google Scholar

[3]

Abreu, R., Zoeteweij, P., and Van Gemund, A. J. C. 2007. On the accuracy of spectrum-based fault localization. In Proceedings of Testing: Academic and Industrial Conference Practice and Research Techniques (TAICPART-MUTATION'07). 89--98.

Digital Library

Google Scholar

[4]

Agrawal, H., Horgan, J. R., London, S., and Wong, W. E. 1995. Fault localization using execution slices and dataflow tests. In Proceedings of the 6^th International Symposium on Software Reliability Engineering. 143--151.

Google Scholar

[5]

Baah, G. K., Podgurski, A., and Harrold, M. J. 2010. Causal inference for statistical fault localization. In Proceedings of the International Symposium on Software Testing and Analysis. 73--84.

Digital Library

Google Scholar

[6]

Bandyopadhyay, A. and Ghosh, S. 2011. Proximity based weighting of test cases to improve spectrum based fault localization. In Proceedings of the 26^th IEEE/ACM International Conference on Automated Software Engineering. 420--423.

Digital Library

Google Scholar

[7]

Chen, M., Kiciman, E., Fratkin, E., Fox, A., and Brewer, E. 2002. Pinpoint: Problem determination in large, dynamic internet services. In Proceedings of the 32^nd IEEE/IFIP International Conference on Dependable Systems and Networks. 595--604.

Digital Library

Google Scholar

[8]

Collofello, J. S. and Woodfield, S. N. 1989. Evaluating the effectiveness of reliability-assurance techniques. J. Syst. Softw. 9, 3, 191--195.

Digital Library

Google Scholar

[9]

Dallmeier, V., Lindig, C., and Zeller, A. 2005. Lightweight defect localization for java. In Proceedings of the 19^th European Conference on Object-Oriented Programming. 528--550.

Digital Library

Google Scholar

[10]

Dickinson, W., Leon, D., and Podgurski, A. 2001. Finding failures by cluster analysis of execution profiles. In Proceedings of the 23^rd International Conference on Software Engineering. 339--348.

Digital Library

Google Scholar

[11]

Digiuseppe, N. and Jones, J. A. 2011. On the influence of multiple faults on coverage-based fault localization. In Proceedings of the International Symposium on Software Testing and Analysis. 199--209.

Digital Library

Google Scholar

[12]

Harrold, M. J., Rothermel, G., Sayre, K., Wu, R., and Yi, L. 2000. An empirical investigation of the relationship between spectra differences and regression faults. Softw. Testing Verif. Reliab. 10, 3, 171--194.

Crossref

Google Scholar

[13]

Harrold, M. J., Rothermel, G., Wu, R., and Yi, L. 1998. An empirical investigation of program spectra. In Proceedings of the 1^st ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. 83--90.

Digital Library

Google Scholar

[14]

Jiang, B., Zhang, Z., Tse, T. H., and Chen, T. Y. 2009. How well do test case prioritization techniques support statistical fault localization. In Proceedings of the 33^rd Annual International Conference on Computer Software and Applications. Vol. 1. 99--106.

Digital Library

Google Scholar

[15]

Jones, J. A., Bowring, J. F., and Harrold, M. J. 2007. Debugging in parallel. In Proceedings of the International Symposium on Software Testing and Analysis. 16--26.

Digital Library

Google Scholar

[16]

Jones, J. A. and Harrold, M. J. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20^th IEEE/ACM International Conference on Automated Software Engineering. 273--282.

Digital Library

Google Scholar

[17]

Jones, J. A., Harrold, M. J., and Stasko, J. 2002. Visualization of test information to assist fault localization. In Proceedings of the 24^th International Conference on Software Engineering. 467--477.

Digital Library

Google Scholar

[18]

Lee, H. J., Naish, L., and Ramamohanarao, K. 2009a. Study of the relationship of bug consistency with respect to performance of spectra metrics. In Proceedings of the 2^nd IEEE International Conference on Computer Science and Information Technology. 501--508.

Google Scholar

[19]

Lee, H. J., Naish, L., and Ramamohanarao, K. 2009b. The effectiveness of using non redundant test cases with program spectra for bug localization. In Proceedings of the 2^nd IEEE International Conference on Computer Science and Information Technology. 127--134.

Google Scholar

[20]

Liblit, B. R. 2004. Cooperative bug isolation. Ph.D. thesis, University of California. http://theory.stanford.edu/&sim;aiken/publications/theses/liblit.pdf.

Digital Library

Google Scholar

[21]

Liblit, B. R., Naik, M., Zheng, A. X., Aiken, A., and Jordan, M. I. 2005. Scalable statistical bug isolation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 15--26.

Digital Library

Google Scholar

[22]

Liu, C., Fei, L., Yan, X., Han, J., and Midkiff, S. 2006. Statistical debugging: A hypothesis testing-based approach. IEEE Trans. Softw. Engin. 32, 10, 831--848.

Digital Library

Google Scholar

[23]

Liu, C. and Han, J. 2006. Failure proximity: A fault localization-based approach. In Proceedings of the 14^th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 46--56.

Digital Library

Google Scholar

[24]

Naish, L., Lee, H. J., and Ramamohanarao, K. 2009. Spectral debugging with weights and incremental ranking. In Proceedings of the 16^th Asia-Pacific Software Engineering Conference. 168--175.

Digital Library

Google Scholar

[25]

Naish, L., Lee, H. J., and Ramamohanarao, K. 2011. A model for spectra-based software diagnosis. ACM Trans. Softw. Engin. Methodol. 20, 3, 11:1--11:32.

Digital Library

Google Scholar

[26]

Parnin, C. and Orso, A. 2011. Are automated debugging techniques actually helping programmers&quest; In Proceedings of the International Symposium on Software Testing and Analysis. 199--209.

Digital Library

Google Scholar

[27]

Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., and WANG, B. 2003. Automated support for classifying software failure reports. In Proceedings of the 25^th International Conference on Software Engineering. 465--475.

Digital Library

Google Scholar

[28]

Reps, T., Ball, T., Das, M., and Larus, J. 1997. The use of program profiling for software maintenance with applications to the year 2000 problem. In Proceedings of the 6^th European Software Engineering Conference held jointly with the 5^th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 432--449.

Digital Library

Google Scholar

[29]

Santelices, R., Jones, J. A., Yu, Y., and Harrold, M. J. 2009. Lightweight fault-localization using multiple coverage types. In Proceedings of the 31^st International Conference on Software Engineering. 56--66.

Digital Library

Google Scholar

[30]

Sir. 2005. http://sir.unl.edu/php/index.php.

Google Scholar

[31]

Wong, W. E., Debroy, V., and Choi, B. 2010. A family of code coverage-based heuristics for effective fault localization. J. Syst. Softw. 83, 2, 188--208.

Digital Library

Google Scholar

[32]

Wong, W. E. and Qi, Y. 2006. Effective program debugging based on execution slices and inter-block data dependency. J. Syst. Softw. 79, 7, 891--903.

Digital Library

Google Scholar

[33]

Wong, W. E., Qi, Y., Zhao, L., and Cai, K. Y. 2007. Effective fault localization using code coverage. In Proceedings of the 31^st Annual International Conference on Computer Software and Applications. 449--456.

Digital Library

Google Scholar

[34]

Wong, W. E., Wei, T., Qi, Y., and Zhao, L. 2008. A crosstab-based statistical method for effective fault localization. In Proceedings of the 1^st International Conference on Software Testing, Verification and Validation. 42--51.

Digital Library

Google Scholar

[35]

Xie, X. Y. 2012. On the analysis of spectrum-based fault localization. Ph.D. thesis, Swinburne University of Technology, Australia. http://www.ict.swin.edu.au/personal/xxie/publications/XiaoyuanXie-PhDThesis.pdf.

Google Scholar

[36]

Xie, X. Y., Chen, T. Y., and Xu, B. W. 2010. Isolating suspiciousness from spectrum-based fault localization techniques. In Proceedings of the 10^th International Conference on Quality Software. 385--392.

Digital Library

Google Scholar

[37]

Xie, X. Y., Wong, W. E., Chen, T. Y., and Xu, B. W. 2011. Spectrum-based fault localization: Testing oracles are no longer mandatory. In Proceedings of the 11^th International Conference on Quality Software. 1--10.

Digital Library

Google Scholar

[38]

Yu, Y., Jones, J. A., and Harrold, M. J. 2008. An empirical study of the effects of test-suite reduction on fault localization. In Proceedings of the 30^th International Conference on Software Engineering. 201--210.

Digital Library

Google Scholar

[39]

Zeller, A. 2002. Isolating cause-effect chains from computer programs. In Proceedings of the 10^th ACM SIGSOFT Symposium on Foundations of Software Engineering. 1--10.

Digital Library

Google Scholar

[40]

Zheng, A. X., Jordan, M. I., Liblit, B., Naik, M., and Aiken, A. 2006. Statistical debugging: Simultaneous identification of multiple bugs. In Proceedings of the 23^rd International Conference on Machine Learning. 1105--1112.

Digital Library

Google Scholar

Cited By

View all

Lee SBinkley DFeldt RGold NYoo S(2025)Causal program dependence analysisScience of Computer Programming10.1016/j.scico.2024.103208240(103208)Online publication date: Feb-2025
https://doi.org/10.1016/j.scico.2024.103208
Shen YGao XSun HGuo Y(2025)Understanding vulnerabilities in software supply chainsEmpirical Software Engineering10.1007/s10664-024-10581-230:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10664-024-10581-2
ZHENG WHU HCHEN TYANG FFAN XXIAO P(2024)Boosting Spectrum-Based Fault Localization via Multi-Correct Programs in Online ProgrammingIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7164E107.D:4(525-536)Online publication date: 1-Apr-2024
https://doi.org/10.1587/transinf.2023EDP7164
Show More Cited By

Index Terms

A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Boosting spectrum-based fault localization using PageRank
ISSTA 2017: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis

Manual debugging is notoriously tedious and time consuming. Therefore, various automated fault localization techniques have been proposed to help with manual debugging. Among the existing fault localization techniques, spectrum-based fault localization ...
Enhancing spectrum based fault localization via emphasizing its formulas with importance weight
APR '22: Proceedings of the Third International Workshop on Automated Program Repair

Spectrum-Based Fault Localization (SBFL) computes suspicion scores, using risk evaluation formulas, for program elements (e.g., statements, methods, or classes) by counting how often each element is executed or not executed by passing versus failing ...
A vector table model-based systematic analysis of spectral fault localization techniques

Spectral fault localization (SFL) is an automatic fault-localization technique, which uses risk evaluation formula to rank the risk of fault existence in each program entity after collecting the testing information dynamically. To provide insight into ...

Reviews

Reviewer: T.H. Tse

Spectrum-based fault localization is a popular technique in automatic program debugging. Researchers analyze the distribution of pass and fail cases in program testing using different risk evaluation formulas, and validate how their proposals are better than earlier work via empirical studies. In this paper, the authors propose a theoretical framework to compare 30 risk evaluation formulas in terms of the percentage of code examined before a fault is identified. They rank the formulas using "better" and "equivalent" relations. Only five formulas are proven to be the most efficient. Many of the best-known formulas are not among them. There is an unhealthy tendency toward empirical studies in software testing and debugging research. Researchers use hypothesis testing to determine whether their proposal is better than that of their predecessors. Reviewers demand more subject programs and larger test pools for further validation. It is refreshing to see that the authors of this paper do not simply rely on empirical studies, but prove mathematically whether various proposals have hit their mark. This paper is not the only example of the successful application of mathematical theory by Chen's research group. Chen and Merkel prove in one paper [1] that no test case generation technique can be better than random testing by more than 50 percent. Hence, their proposed adaptive random testing technique is close to this theoretic limit. In another paper [2], Chen and Yu prove that their proposed proportional sampling strategy is the only partition testing strategy that ensures that the probability of finding at least one failure is no lower than random testing for any program. Understandably, some researchers are disgruntled because these theoretical results stop them from making further incremental proposals. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 22, Issue 4

Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance

October 2013

387 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/2522920

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2013

Accepted: 01 October 2012

Revised: 01 September 2012

Received: 01 January 2012

Published in TOSEM Volume 22, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

262
Total Citations
View Citations
1,506
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)8

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lee SBinkley DFeldt RGold NYoo S(2025)Causal program dependence analysisScience of Computer Programming10.1016/j.scico.2024.103208240(103208)Online publication date: Feb-2025
https://doi.org/10.1016/j.scico.2024.103208
Shen YGao XSun HGuo Y(2025)Understanding vulnerabilities in software supply chainsEmpirical Software Engineering10.1007/s10664-024-10581-230:1Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1007/s10664-024-10581-2
ZHENG WHU HCHEN TYANG FFAN XXIAO P(2024)Boosting Spectrum-Based Fault Localization via Multi-Correct Programs in Online ProgrammingIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7164E107.D:4(525-536)Online publication date: 1-Apr-2024
https://doi.org/10.1587/transinf.2023EDP7164
ZHANG ZLI DXIA LLI YMENG X(2024)A Data Augmentation Method for Fault Localization with Fault Propagation Context and VAEIEICE Transactions on Information and Systems10.1587/transinf.2023EDL8052E107.D:2(234-238)Online publication date: 1-Feb-2024
https://doi.org/10.1587/transinf.2023EDL8052
Zhang XSong YXie XXin QXing CFilkov VRay BZhou M(2024)Do not neglect what's on your hands: localizing software faults with exception trigger streamProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695479(982-994)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695479
Wu JZhang ZYang DXu JHe JMao X(2024)Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data PurificationACM Transactions on Architecture and Code Optimization10.1145/367800921:3(1-25)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3678009
Song YZhang XXie XChen SLiu QGao R(2024)SURE: A Visualized Failure Indexing Approach Using Program Memory SpectrumACM Transactions on Software Engineering and Methodology10.1145/367695833:8(1-43)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676958
Horváth FAszmann RSoha PBeszédes ÁGyimóthy T(2024)Context Switch Sensitive Fault LocalizationProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661181(110-119)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661181
Rafi MKim DChen AChen TWang S(2024)Towards Better Graph Neural Network-Based Fault Localization through Enhanced Code RepresentationProceedings of the ACM on Software Engineering10.1145/36607931:FSE(1937-1959)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660793
Song YZhang XXie XLiu QGao RXing CRoychoudhury APaiva AAbreu RStorey M(2024)ReClues: Representing and indexing failures in parallel debugging with program variablesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639098(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639098
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Boosting spectrum-based fault localization using PageRank

Enhancing spectrum based fault localization via emphasizing its formulas with importance weight

A vector table model-based systematic analysis of spectral fault localization techniques

Reviews

Access critical reviews of Computing literature here

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Boosting spectrum-based fault localization using PageRank

Enhancing spectrum based fault localization via emphasizing its formulas with importance weight

A vector table model-based systematic analysis of spectral fault localization techniques

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations