research-article

Bug localization with combination of deep learning and information retrieval

Authors:

Anh Tuan Nguyen,

Hoan Anh Nguyen,

Tien N. NguyenAuthors Info & Claims

ICPC '17: Proceedings of the 25th International Conference on Program Comprehension

Pages 218 - 229

https://doi.org/10.1109/ICPC.2017.24

Published: 20 May 2017 Publication History

Abstract

The automated task of locating the potential buggy files in a software project given a bug report is called bug localization. Bug localization helps developers focus on crucial files. However, the existing automated bug localization approaches face a key challenge, called lexical mismatch. Specifically, the terms used in bug reports to describe a bug are different from the terms and code tokens used in source files. To address that, we present a novel approach that uses deep neural network (DNN) in combination with rVSM, an information retrieval (IR) technique. rVSM collects the feature on the textual similarity between bug reports and source files. DNN is used to learn to relate the terms in bug reports to potentially different code tokens and terms in source files. Our empirical evaluation on real-world bug reports in the open-source projects shows that DNN and IR complement well to each other to achieve higher bug localization accuracy than individual models. Importantly, our new model, DnnLoc, with a combination of the features built from DNN, rVSM, and project's bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques. In half of the cases, it is correct with just a single suggested file. In 66% of the time, a correct buggy file is in the list of three suggested files. With 5 suggested files, it is correct in almost 70% of the cases.

References

[1]

E. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan, "The design of bug fixes," in Proceedings of the 35th International Conference on Software Engineering (ICSE 2013). IEEE, May 2013.

Digital Library

[2]

R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. C. van Gemund, "A practical evaluation of spectrum-based fault localization," J. Syst. Softw., vol. 82, no. 11, pp. 1780--1792, Nov. 2009.

Digital Library

[3]

J. A. Jones and M. J. Harrold, "Empirical evaluation of the tarantula automatic fault-localization technique," in Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE '05. ACM, 2005, pp. 273--282.

Digital Library

[4]

J. A. Jones, M. J. Harrold, and J. Stasko, "Visualization of test information to assist fault localization," in Proceedings of the 24th International Conference on Software Engineering, ser. ICSE '02. ACM, 2002, pp. 467--477.

Digital Library

[5]

B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, "Scalable statistical bug isolation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '05. ACM, 2005, pp. 15--26.

Digital Library

[6]

C. Liu, L. Fei, X. Yan, J. Han, S. Member, and S. P. Midkiff, "Statistical debugging: A hypothesis testing-based approach," IEEE Transaction on Software Engineering, vol. 32, pp. 831--848, 2006.

Digital Library

[7]

S. K. Lukins, N. A. Kraft, and L. H. Etzkorn, "Bug localization using latent dirichlet allocation," Inf. Softw. Technol., vol. 52, no. 9, pp. 972--990, Sep. 2010.

Digital Library

[8]

S. Rao and A. Kak, "Retrieval from software libraries for bug localization: A comparative study of generic and composite text models," in Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR '11. ACM, 2011, pp. 43--52.

Digital Library

[9]

X. Ye, R. Bunescu, and C. Liu, "Learning to rank relevant files for bug reports using domain knowledge," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. ACM, 2014, pp. 689--699.

Digital Library

[10]

J. Zhou, H. Zhang, and D. Lo, "Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports," in Proceedings of the 34th International Conference on Software Engineering, ser. ICSE '12. IEEE Press, 2012, pp. 14--24.

Digital Library

[11]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, vol. 41, no. 6, pp. 391--407, 1990.

[12]

H. U. Asuncion, A. U. Asuncion, and R. N. Taylor, "Software traceability with topic modeling," in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ser. ICSE'10. ACM, 2010, pp. 95--104.

Digital Library

[13]

A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen, "A topic-based approach for narrowing the search space of buggy files from a bug report," in Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE'11. IEEE CS, 2011, pp. 263--272.

Digital Library

[14]

D. Kim, Y. Tao, S. Kim, and A. Zeller, "Where should we fix this bug? a two-phase recommendation model," IEEE Transactions on Software Engineering, vol. 39, no. 11, pp. 1597--1610, 2013.

Digital Library

[15]

N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann, "What makes a good bug report?" in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. SIGSOFT '08/FSE-16. ACM, 2008, pp. 308--318.

Digital Library

[16]

Y. Bengio, Foundations and Trends in Machine Learning - Learning Deep Architectures for AI. NOW, the essence of knowledge, 2009.

Digital Library

[17]

http://en.wikipedia.org/wiki/Deep_learning.

[18]

http://en.wikipedia.org/wiki/Artificial_neural_network.

[19]

L. Deng and D. Yu, Deep Learning Methods and Applications - Foundations and trends in signal processing. USA: NOW, 2014.

Digital Library

[20]

E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Deep neural network language models," in Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, ser. WLM '12. Association for Computational Linguistics, 2012, pp. 20--28.

Digital Library

[21]

http://en.wikipedia.org/wiki/Tf-idf.

[22]

S. Kim, T. Zimmermann, E. J. Whitehead, and A. Zeller, "Predicting faults from cached history," in Proceedings of the 29th international conference on Software Engineering (ICSE'07). IEEE CS, 2007, pp. 489--498.

Digital Library

[23]

S. Wang, D. Lo, and J. Lawall, "Compositional vector space models for improved bug localization," in Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014.

Digital Library

[24]

B. Ashok, J. Joy, H. Liang, S. K. Rajamani, G. Srinivasa, and V. Vangala, "Debugadvisor: A recommender system for debugging," in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ser. ESEC/FSE '09. ACM, 2009, pp. 373--382.

Digital Library

[25]

D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker, "Using natural language program analysis to locate and understand action-oriented concerns," in Proceedings of the 6th International Conference on Aspect-oriented Software Development, ser. AOSD '07. ACM, 2007, pp. 212--224.

Digital Library

[26]

L. Moreno, J. J. Treadway, A. Marcus, and W. Shen, "On the use of stack traces to improve text retrieval-based bug localization," in IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014.

Digital Library

[27]

C.-P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei, "Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis," in Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014, pp. 181--190.

Digital Library

[28]

R. Saha, M. Lease, S. Khurshid, and D. Perry, "Improving bug localization using structured information retrieval," in Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE'13). IEEE CS, 2013, pp. 345--355.

Digital Library

[29]

R. K. Saha, J. Lawall, S. Khurshid, and D. E. Perry, "On the effectiveness of information retrieval based bug localization for c programs," in Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014.

Digital Library

[30]

H. Cleve and A. Zeller, "Locating causes of program failures," in Proceedings of the 27th International Conference on Software Engineering, ser. ICSE '05. ACM, 2005, pp. 342--351.

Digital Library

[31]

X. Ren, F. Shah, F. Tip, B. G. Ryder, and O. Chesley, "Chianti: A tool for change impact analysis of java programs," in Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, ser. OOPSLA '04. ACM, 2004, pp. 432--448.

Digital Library

[32]

O. C. Chesley, X. Ren, B. G. Ryder, and F. Tip, "Crisp-a fault localization tool for java programs," in Proceedings of the 29th International Conference on Software Engineering, ser. ICSE '07. IEEE Computer Society, 2007, pp. 775--779.

Digital Library

[33]

Y. Brun and M. D. Ernst, "Finding latent code errors via machine learning over program executions," in Proceedings of the 26th International Conference on Software Engineering, ser. ICSE '04. IEEE Computer Society, 2004, pp. 480--490.

Digital Library

[34]

M. Weiser, "Programmers use slices when debugging," Commun. ACM, vol. 25, no. 7, pp. 446--452, Jul. 1982.

Digital Library

[35]

R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang, "Pse: Explaining program failures via postmortem static analysis," in Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering, ser. SIGSOFT '04/FSE-12. ACM, 2004, pp. 63--72.

Digital Library

[36]

M. Acharya and B. Robinson, "Practical change impact analysis based on static program slicing for industrial software systems," in Proceedings of the 33rd International Conference on Software Engineering, ser. ICSE '11. ACM, 2011, pp. 746--755.

Digital Library

[37]

S. Kim, E. J. Whitehead, Jr., and Y. Zhang, "Classifying software changes: Clean or buggy?" IEEE Trans. Softw. Eng., vol. 34, no. 2, pp. 181--196, Mar. 2008. {Online}. Available

Digital Library

[38]

A. E. Hassan, "Predicting faults using the complexity of code changes," in Proceedings of the 31st International Conference on Software Engineering, ser. ICSE '09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 78--88. {Online}. Available

Digital Library

[39]

R. Moser, W. Pedrycz, and G. Succi, "A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction," in Proceedings of the 30th International Conference on Software Engineering, ser. ICSE '08. New York, NY, USA: ACM, 2008, pp. 181--190. {Online}. Available

Digital Library

[40]

M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: A benchmark and an extensive comparison," Empirical Softw. Engg., vol. 17, no. 4--5, pp. 531--577, Aug. 2012.

Digital Library

[41]

A. N. Lam, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "Combining deep learning with information retrieval to localize buggy files for bug reports (n)," in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15), ser. ASE '15. IEEE CS, 2015, pp. 476--481. {Online}. Available

Digital Library

[42]

https://deeplearning4j.org/.

[43]

T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "A statistical semantic language model for source code," in Proceedings of the 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. ACM, 2013, pp. 532--542.

Digital Library

[44]

A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software," in Proceedings of the 2012 International Conference on Software Engineering, ser. ICSE 2012. IEEE Press, 2012, pp. 837--847.

Digital Library

[45]

V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '14. ACM, 2014, pp. 419--428.

Digital Library

[46]

A. T. Nguyen and T. N. Nguyen, "Graph-based statistical language model for code," in Proceedings of the 37th International Conference on Software Engineering, ser. ICSE 2015. IEEE CS, 2015.

Digital Library

[47]

A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig, "API code recommendation using statistical learning from fine-grained changes," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. ACM, 2016, pp. 511--522. {Online}. Available

Digital Library

[48]

A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, "Divide-and-conquer approach for multi-phase statistical migration for source code (t)," in Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), ser. ASE '15. IEEE CS, 2015, pp. 585--596. {Online}. Available

Digital Library

[49]

M. Raghothaman, Y. Wei, and Y. Hamadi, "SWIM: synthesizing what I mean," in Proceedings of the 38th International Conference on Software Engineering, ser. ICSE 2016. ACM Press, 2016.

Digital Library

[50]

T. Nguyen, P. C. Rigby, A. T. Nguyen, M. Karanfil, and T. N. Nguyen, "T2API: Synthesizing API code usage templates from english texts with statistical translation," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. ACM, 2016, pp. 1013--1017. {Online}. Available

Digital Library

Cited By

Wu YWen MYu ZGuo XJin HFilkov VRay BZhou M(2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695013
Song XWu YLiu SChen BLin YPeng XChristakis MPradel M(2024)C2D2: Extracting Critical Changes for Real-World Bugs with Dependency-Sensitive Delta DebuggingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652129(300-312)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652129
Li XZhang ZQian ZJaeger TSong CSpinellis DConstantinou EBacchelli A(2024)An Investigation of Patch Porting Practices of the Linux Kernel EcosystemProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644902(63-74)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644902
Show More Cited By

Recommendations

Applying deep learning algorithm to automatic bug localization and repair
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Owing to the increasing size and complexity of software, large/small bugs have become inevitable. To fix software bugs in some cases, developers may need to spend a considerable amount of time debugging. Some studies have reported that typographical ...
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools

Information retrieval (IR) based bug localization approaches process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recently, several IR-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '17: Proceedings of the 25th International Conference on Program Comprehension

May 2017

399 pages

ISBN:9781538605356

General Chair:
Giuseppe Scanniello
University of Basilicata, Italy
,
Program Chairs:
David Lo
Singapore Management University, Singapore
,
Alexander Serebrenik
Eindhoven University of Technology, The Netherlands

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society
SADIO: SADIO

Publisher

IEEE Press

Publication History

Published: 20 May 2017

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '17

Sponsor:

SIGSOFT
IEEE-CS
SADIO

ICSE '17: 39th International Conference on Software Engineering

May 20 - 28, 2017

Buenos Aires, Argentina

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
463
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu YWen MYu ZGuo XJin HFilkov VRay BZhou M(2024)Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695013(393-405)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695013
Song XWu YLiu SChen BLin YPeng XChristakis MPradel M(2024)C2D2: Extracting Critical Changes for Real-World Bugs with Dependency-Sensitive Delta DebuggingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652129(300-312)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652129
Li XZhang ZQian ZJaeger TSong CSpinellis DConstantinou EBacchelli A(2024)An Investigation of Patch Porting Practices of the Linux Kernel EcosystemProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644902(63-74)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644902
Mastropaolo AFerrari VPascarella LBavota G(2024)Log statements generation via deep learningJournal of Systems and Software10.1016/j.jss.2023.111947210:COnline publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111947
Ma YDu YLi MElkind E(2023)Capturing the long-distance dependency in the control flow graph via structural-guided attention for bug localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/249(2242-2250)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/249
Du YYu ZChandra SBlincoe KTonella P(2023)Pre-training Code Representation with Semantic Flow Graph for Effective Bug LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616338(579-591)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616338
C. SMenzies T(2023)Assessing the Early Bird Heuristic (for Predicting Project Quality)ACM Transactions on Software Engineering and Methodology10.1145/358356532:5(1-39)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3583565
Wang CYang YGao CPeng YZhang HLyu M(2023)Prompt Tuning in Code Intelligence: An Experimental EvaluationIEEE Transactions on Software Engineering10.1109/TSE.2023.331388149:11(4869-4885)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3313881
Büyük ONizam A(2023)Deep learning with class-level abstract syntax tree and code histories for detecting code modification requirementsJournal of Systems and Software10.1016/j.jss.2023.111851206:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.jss.2023.111851
Wang CYang YGao CPeng YZhang HLyu MRoychoudhury ACadar CKim M(2022)No more fine-tuning? an experimental evaluation of prompt tuning in code intelligenceProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549113(382-394)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549113
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents