research-article

Open access

DEAR: a novel deep learning-based approach for automated program repair

Authors:

Tien N. NguyenAuthors Info & Claims

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 511 - 523

https://doi.org/10.1145/3510003.3510177

Published: 05 July 2022 Publication History

Abstract

The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. We present DEAR, a DL-based approach that supports fixing for the general bugs that require dependent changes at once to one or multiple consecutive statements in one or multiple hunks of code. We first design a novel fault localization (FL) technique for multi-hunk, multi-statement fixes that combines traditional spectrum-based (SB) FL with deep learning and data-flow analysis. It takes the buggy statements returned by the SBFL model, detects the buggy hunks to be fixed at once, and expands a buggy statement s in a hunk to include other suspicious statements around s. We design a two-tier, tree-based LSTM model that incorporates cycle training and uses a divide-and-conquer strategy to learn proper code transformations for fixing multiple statements in the suitable fixing context consisting of surrounding subtrees. We conducted several experiments to evaluate DEAR on three datasets: Defects4J (395 bugs), BigFix (+26k bugs), and CPatMiner (+44k bugs). On Defects4J dataset, DEAR outperforms the baselines from 42%--683% in terms of the number of auto-fixed bugs with only the top-1 patches. On BigFix dataset, it fixes 31--145 more bugs than existing DL-based APR models with the top-1 patches. On CPatMiner dataset, among 667 fixed bugs, there are 169 (25.3%) multi-hunk/multi-statement bugs. DEAR fixes 71 and 164 more bugs, including 52 and 61 more multi-hunk/multi-statement bugs, than the state-of-the-art, DL-based APR models.

References

[1]

2019. The Defects4J Data Set. https://github.com/rjust/defects4j

[2]

2021. DEAR: A Novel Deep Learning-based Approach for Automated Program Repair. https://github.com/AutomatedProgramRepair-2021/dear-auto-fix

[3]

Rui Abreu, Peter Zoeteweij, and Arjan J.c. Van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing (PRDC). 39--46.

Digital Library

[4]

Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2021. TreeCaps: Tree-Based Capsule Networks for Source Code Processing. Proceedings of the AAAI Conference on Artificial Intelligence 35, 1 (May 2021), 30--38. https://ojs.aaai.org/index.php/AAAI/article/view/16074

[5]

Saikat Chakraborty, Yangruibo Ding, Miltiadis Allamanis, and Baishakhi Ray. 2020. CODIT: Code Editing with Tree-Based Neural Models. IEEE Transactions on Software Engineering (2020).

[6]

Zimin Chen, Steve James Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair. IEEE Transactions on Software Engineering (2019).

[7]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.

[9]

Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering (ICSE'12). 3--13.

[10]

Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (Jan 2012), 54--72.

Digital Library

[11]

Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI'17). AAAI Press, 1345--1351.

[12]

Hideaki Hata, Emad Shihab, and Graham Neubig. 2018. Learning to generate corrective patches using neural machine translation. arXiv preprint arXiv:1812.07170 (2018).

[13]

Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 298--309.

Digital Library

[14]

Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In Proceedings of the 43rd International Conference on Software Engineering (ICSE'21). 1161--1173.

Digital Library

[15]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In Proceedings of the 35th International Conference on Software Engineering (ICSE'13). 802--811.

[16]

Anil Koyuncu, Kui Liu, Tegawendé F Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2020. Fixminer: Mining relevant fix patterns for automated program repair. Empirical Software Engineering 25 (2020), 1980--2024.

Digital Library

[17]

Xuan Bach D. Le, David Lo, and Claire Le Goues. 2016. History Driven Program Repair. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER'16), Vol. 1. 213--224.

[18]

Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. DLFix: Context-Based Code Transformation Learning for Automated Program Repair. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE '20). Association for Computing Machinery, New York, NY, USA, 602--614.

Digital Library

[19]

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandè. 2019. AVATAR: Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER'19). 1--12.

[20]

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting Template-Based Automated Program Repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA'19). Association for Computing Machinery, New York, NY, USA, 31--42.

Digital Library

[21]

Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic Inference of Code Transforms for Patch Generation. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE'17). Association for Computing Machinery, New York, NY, USA, 727--739.

Digital Library

[22]

Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (St. Petersburg, FL, USA) (POPL '16). Association for Computing Machinery, New York, NY, USA, 298--312.

Digital Library

[23]

Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual Event, USA) (ISSTA'20). Association for Computing Machinery, New York, NY, USA, 101--114.

Digital Library

[24]

Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library for Java (Demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) (ISSTA'16). Association for Computing Machinery, New York, NY, USA, 441--444.

Digital Library

[25]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE '16). Association for Computing Machinery, New York, NY, USA, 691--701.

Digital Library

[26]

Hoan Anh Nguyen, Tien N. Nguyen, Danny Dig, Son Nguyen, Hieu Tran, and Michael Hilton. 2019. Graph-Based Mining of in-the-Wild, Fine-Grained, Semantic Code Change Patterns. In Proceedings of the 41st International Conference on Software Engineering (ICSE '19). IEEE Press, 819--830.

Digital Library

[27]

Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: Program repair via semantic analysis. In Proceedings of the 35th International Conference on Software Engineering (ICSE'13). 772--781.

[28]

Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar Al-Kofahi, and Tien N. Nguyen. 2010. Recurring Bug Fixes in Object-Oriented Programs. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (Cape Town, South Africa) (ICSE'10). Association for Computing Machinery, New York, NY, USA, 315--324.

Digital Library

[29]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532--1543. http://www.aclweb.org/anthology/D14-1162

[30]

Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The Strength of Random Search on Automated Program Repair. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE'14). Association for Computing Machinery, New York, NY, USA, 254--265.

Digital Library

[31]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).

[32]

Baishakhi Ray and Miryung Kim. 2012. A Case Study of Cross-System Porting in Forked Projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (Cary, North Carolina) (FSE'12). Association for Computing Machinery, New York, NY, USA, Article 53, 11 pages.

Digital Library

[33]

Ripon K Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R Prasad. 2017. Elixir: Effective object-oriented program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE'17). 648--659.

[34]

Seemanta Saha, Ripon K. Saha, and Mukul R. Prasad. 2019. Harnessing Evolution for Multi-Hunk Program Repair. In Proceedings of the 41st International Conference on Software Engineering (ICSE '19). IEEE Press, 13--24.

Digital Library

[35]

Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).

[36]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 1556--1566.

[37]

Shin Hwei Tan, Zhen Dong, Xiang Gao, and Abhik Roychoudhury. 2018. Repairing Crashes in Android Apps. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE'18). Association for Computing Machinery, New York, NY, USA, 187--198.

Digital Library

[38]

Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On Learning Meaningful Code Changes Via Neural Machine Translation. In Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE'19). 25--36.

Digital Library

[39]

Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2018. An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (Montpellier, France) (ASE'18). Association for Computing Machinery, New York, NY, USA, 832--837.

Digital Library

[40]

Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-Aware Patch Generation for Better Automated Program Repair. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE'18). Association for Computing Machinery, New York, NY, USA, 1--11.

Digital Library

[41]

Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER'19). 479--490.

[42]

Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). ACM, 87--98.

Digital Library

[43]

Qi Xin and Steven P. Reiss. 2017. Leveraging Syntax-Related Code for Automated Program Repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (Urbana-Champaign, IL, USA) (ASE'17). IEEE Press, 660--670.

[44]

Yuan Yuan and Wolfgang Banzhaf. 2020. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering (TSE) 46, 10 (2020), 1040--1067.

[45]

J. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 2242--2251.

Cited By

Huang KXu ZYang SSun HLi XYan ZZhang Y(2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3696450
Kim YPark YHan SYi JFilkov VRay BZhou M(2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695602
Zhong WLi CLiu KGe JLuo BBissyandé TNg V(2024)Benchmarking and Categorizing the Performance of Neural Program Repair Systems for JavaACM Transactions on Software Engineering and Methodology10.1145/368883434:1(1-35)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3688834
Show More Cited By

Index Terms

DEAR: a novel deep learning-based approach for automated program repair
1. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

DLFix: context-based code transformation learning for automated program repair
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Automated Program Repair (APR) is very useful in helping developers in the process of software development and maintenance. Despite recent advances in deep learning (DL), the DL-based APR approaches still have limitations in learning bug-fixing code ...
Comparing developer-provided to user-provided tests for fault localization and automated program repair
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

To realistically evaluate a software testing or debugging technique, it must be run on defects and tests that are characteristic of those a developer would encounter in practice. For example, to determine the utility of a fault localization or automated ...
CRSearcher: Searching Code Database for Repairing Bugs
Internetware '17: Proceedings of the 9th Asia-Pacific Symposium on Internetware

With the exponentially rising of software development in the past decades, millions of software products have been created. Existing empirical studies show that many code snippets are similar. Although there exist many difficulties in maintaining these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

May 2022

2508 pages

ISBN:9781450392211

DOI:10.1145/3510003

General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21 - 29, 2022

Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
1,577
Total Downloads

Downloads (Last 12 months)555
Downloads (Last 6 weeks)84

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang KXu ZYang SSun HLi XYan ZZhang Y(2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3696450
Kim YPark YHan SYi JFilkov VRay BZhou M(2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695602
Zhong WLi CLiu KGe JLuo BBissyandé TNg V(2024)Benchmarking and Categorizing the Performance of Neural Program Repair Systems for JavaACM Transactions on Software Engineering and Methodology10.1145/368883434:1(1-35)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3688834
Lou YYang JBenton SHao DTan LChen ZZhang LZhang L(2024)When Automated Program Repair Meets Regression Testing—An Extensive Study on Two Million PatchesACM Transactions on Software Engineering and Methodology10.1145/367245033:7(1-23)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3672450
Pan SBao LZhou JHu XXia XLi Sd'Amorim M(2024)Unveil the Mystery of Critical Software VulnerabilitiesCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663835(138-149)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663835
Xin QWu HTang JLiu XReiss SXuan J(2024)Detecting, Creating, Repairing, and Understanding Indivisible Multi-Hunk BugsProceedings of the ACM on Software Engineering10.1145/36608281:FSE(2747-2770)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660828
Lin BWang SWen MChen LMao XChristakis MPradel M(2024)One Size Does Not Fit All: Multi-granularity Patch Generation for Better Automated Program RepairProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680381(1554-1566)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680381
Xue ZGao ZWang SHu XXia XLi SChristakis MPradel M(2024)SelfPiCo: Self-Guided Partial Code Execution with LLMsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680368(1389-1401)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680368
Yin XNi CWang SLi ZZeng LYang XChristakis MPradel M(2024)ThinkRepair: Self-Directed Automated Program RepairProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680359(1274-1286)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680359
Song XWu YLiu SChen BLin YPeng XChristakis MPradel M(2024)C2D2: Extracting Critical Changes for Real-World Bugs with Dependency-Sensitive Delta DebuggingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652129(300-312)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652129
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents