skip to main content
10.1145/3377811.3380338acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

On the efficiency of test suite based program repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

Published: 01 October 2020 Publication History

Abstract

Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program specifications. Although the literature regularly sets new records on the number of benchmark bugs that can be fixed, several studies increasingly raise concerns about the limitations and biases of state-of-the-art approaches. For example, the correctness of generated patches has been questioned in a number of studies, while other researchers pointed out that evaluation schemes may be misleading with respect to the processing of fault localization results. Nevertheless, there is little work addressing the efficiency of patch generation, with regard to the practicality of program repair. In this paper, we fill this gap in the literature, by providing an extensive review on the efficiency of test suite based program repair. Our objective is to assess the number of generated patch candidates, since this information is correlated to (1) the strategy to traverse the search space efficiently in order to select sensical repair attempts, (2) the strategy to minimize the test effort for identifying a plausible patch, (3) as well as the strategy to prioritize the generation of a correct patch. To that end, we perform a large-scale empirical study on the efficiency, in terms of quantity of generated patch candidates of the 16 open-source repair tools for Java programs. The experiments are carefully conducted under the same fault localization configurations to limit biases. Eventually, among other findings, we note that: (1) many irrelevant patch candidates are generated by changing wrong code locations; (2) however, if the search space is carefully triaged, fault localization noise has little impact on patch generation efficiency; (3) yet, current template-based repair systems, which are known to be most effective in fixing a large number of bugs, are actually least efficient as they tend to generate majoritarily irrelevant patch candidates.

References

[1]
Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION (TAICPART-MUTATION). IEEE, 89--98.
[2]
Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. Comput. Surveys 51, 4 (2018), 81:1--81:37.
[3]
Tien-Duy B. Le, David Lo, Claire Le Goues, and Lars Grunske. 2016. A Learning-to-Rank Based Fault Localization Approach Using Likely Invariants. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 177--188.
[4]
Liushan Chen, Yu Pei, and Carlo A Furia. 2017. Contract-based program repair without the contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 637--647.
[5]
Vidroha Debroy and W Eric Wong. 2010. Using mutation to automatically suggest fixes for faulty programs. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation. IEEE, 65--74.
[6]
Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus. 2017. Dynamic patch generation for null pointer exceptions using metaprogramming. In Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 349--358.
[7]
Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 302--313.
[8]
Thomas Durieux and Martin Monperrus. 2016. Dynamoth: dynamic code synthesis for automatic program repair. In Proceedings of the 11th IEEE/ACM International Workshop in Automation of Software Test. IEEE, 85--91.
[9]
Michael Frigge, David C. Hoaglin, and Boris Iglewicz. 1989. Some implementations of the boxplot. The American Statistician 43, 1 (1989), 50--54.
[10]
Zachary P. Fry, Bryan Landau, and Westley Weimer. 2012. A Human Study of Patch Maintainability. In Proceedings of the 21st International Symposium on Software Testing and Analysis. ACM, 177--187.
[11]
Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2017. Automatic software repair: A survey. IEEE Transactions on Software Engineering 45, 1 (2017), 34--67.
[12]
Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 19--30.
[13]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. Deepfix: Fixing common c language errors by deep learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI, 1345--1351.
[14]
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering. ACM, 12--23.
[15]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 298--309.
[16]
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, 437--440.
[17]
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In Proceedings of the 35th International Conference on Software Engineering. IEEE, 802--811.
[18]
Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering. ACM, 946--957.
[19]
Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2018. Fixminer: Mining relevant fix patterns for automated program repair. arXiv preprint arXiv:1810.01791 (2018).
[20]
Xuan-Bach D. Le, Lingfeng Bao, David Lo, Xin Xia, Shanping Li, and Corina Pasareanu. 2019. On reliability of patch correctness assessment. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 524--535.
[21]
Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax-and semantic-guided repair synthesis via programming by examples. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 593--604.
[22]
Xuan Bach D. Le, David Lo, and Claire Le Goues. 2016. History driven program repair. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 213--224.
[23]
Xuan Bach D. Le, Ferdian Thung, David Lo, and Claire Le Goues. 2018. Overfitting in semantics-based automated program repair. Empirical Software Engineering 23, 5 (2018), 3007--3033.
[24]
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 3--13.
[25]
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54--72.
[26]
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM 62, 12 (2019), 56--65.
[27]
Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey Challenge. In Proceedings Companion of the 32nd ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity. ACM, 55--56.
[28]
Kui Liu, Dongsun Kim, Tegawendé F Bissyandé, Shin Yoo, and Yves Le Traon. 2018. Mining fix patterns for findbugs violations. IEEE Transactions on Software Engineering (2018).
[29]
Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques. Klein, and Yves Le Traon. 2019. You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification. 102--113.
[30]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 456--467.
[31]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting Template-based Automated Program Repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 31--42.
[32]
Kui Liu, Anil Koyuncu, Kisub Kim, Dongsun Kim, and Tegawendé F. Bissyandé. 2018. LSRepair: Live Search of Fix Ingredients for Automated Program Repair. In Proceedings of the 25th Asia-Pacific Software Engineering Conference. 658--662.
[33]
Xuliang Liu and Hao Zhong. 2018. Mining stackoverflow for program repair. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 118--129.
[34]
Fan Long and Martin Rinard. 2016. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems. In Proceedings of the 38th International Conference on Software Engineering. ACM, 702--713.
[35]
Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 298--312.
[36]
Fernanda Madeiral, Simon Urli, Marcelo Maia, and Martin Monperrus. 2019. Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 468--478.
[37]
Henry B Mann and Donald R. Whitney. 1947. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50--60.
[38]
Matias Martinez and Martin Monperrus. 2016. Astor: A program repair library for java. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 441--444.
[39]
Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In Proceedings of the 10th International Symposium on Search Based Software Engineering. Springer, 65--86.
[40]
Martin Monperrus. 2014. A critical review of automatic patch generation learned from human-written patches: essay on the problem statement and the evaluation of automatic software repair. In Proceedings of the 36th International Conference on Software Engineering. ACM, 234--242.
[41]
Martin Monperrus. 2018. Automatic software repair: A bibliography. Comput. Surveys 51, 1 (2018), 17:1--17:24.
[42]
Martin Monperrus. 2018. The Living Review on Automated Program Repair. Technical Report. Technical Report hal-01956501. HAL/archives-ouvertes. fr, HAL/archives ....
[43]
Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. 2018. Do automated program repair techniques repair hard and important bugs? Empirical Software Engineering 23, 5 (2018), 2901--2947.
[44]
Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. Semfix: Program repair via semantic analysis. In Proceedings of the 35th International Conference on Software Engineering. IEEE, 772--781.
[45]
Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and Improving Fault Localization. In Proceedings of the 39th International Conference on Software Engineering. IEEE, 609--620.
[46]
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering. ACM, 254--265.
[47]
Yuhua Qi, Xiaoguang Mao, Yan Lei, and Chengsong Wang. 2013. Using automated program repair for evaluating the effectiveness of fault localization techniques. In Proceedings of the 22nd International Symposium on Software Testing and Analysis. ACM, 191--201.
[48]
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 24th International Symposium on Software Testing and Analysis. ACM, 24--36.
[49]
Ripon Saha, Yingjun Lyu, Wing Lam, Hiroaki Yoshida, and Mukul Prasad. 2018. Bugs.jar: A large-scale, diverse dataset of real-world java bugs. In Proceedings of the 15th IEEE/ACM International Conference on Mining Software Repositories. IEEE, 10--13.
[50]
Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. Elixir: Effective object-oriented program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 648--659.
[51]
Seemanta Saha, Ripon K Saha, and Mukul R Prasad. 2019. Harnessing evolution for multi-hunk program repair. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 13--24.
[52]
Edward K Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? overfitting in automated program repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 532--543.
[53]
Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo de Almeida Maia. 2018. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 130--140.
[54]
Shangwen Wang, Ming Wen, Liqian Chen, Xin Yi, and Xiaoguang Mao. 2019. How Different Is It Between Machine-Generated and Developer-Provided Patches?: An Empirical Study on the Correct Patches Generated by Automated Program Repair Techniques. In Proceedings of the 13rd ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 1--12.
[55]
Shangwen Wang, Ming Wen, Xiaoguang Mao, and Deheng Yang. 2019. Attention please: Consider Mockito when evaluating newly proposed automated program repair techniques. In Proceedings of the 23rd Evaluation and Assessment on Software Engineering. ACM, 260--266.
[56]
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering. IEEE, 364--374.
[57]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 1--11.
[58]
Ming Wen, Rongxin Wu, Yepang Liu, Yongqiang Tian, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. 2019. Exploring and Exploiting the Correlations Between Bug-Inducing and Bug-Fixing Commits. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 326--337.
[59]
Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and transforming program repair ingredients via deep learning code similarities. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 479--490.
[60]
F. Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 6 (1945), 80--83.
[61]
Qi Xin and Steven P. Reiss. 2017. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 660--670.
[62]
Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 789--799.
[63]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. IEEE, 416--426.
[64]
Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2017. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering 43, 1 (2017), 34--55.
[65]
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 831--841.
[66]
Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, and Abhik Roychoudhury. 2018. A correlation study between automated program repair and test-suite metrics. Empirical Software Engineering 23, 5 (2018), 2948--2979.
[67]
Yuan Yuan and Wolfgang Banzhaf. 2018. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering (2018).

Cited By

View all

Index Terms

  1. On the efficiency of test suite based program repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
    June 2020
    1640 pages
    ISBN:9781450371216
    DOI:10.1145/3377811
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • KIISE: Korean Institute of Information Scientists and Engineers
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. efficiency
    2. empirical assessment
    3. patch generation
    4. program repair

    Qualifiers

    • Research-article

    Funding Sources

    • FNR-RECOMMEND
    • FNR-FIXPATTERN

    Conference

    ICSE '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)180
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media