skip to main content
10.1109/ICPC.2017.24acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Bug localization with combination of deep learning and information retrieval

Published: 20 May 2017 Publication History

Abstract

The automated task of locating the potential buggy files in a software project given a bug report is called bug localization. Bug localization helps developers focus on crucial files. However, the existing automated bug localization approaches face a key challenge, called lexical mismatch. Specifically, the terms used in bug reports to describe a bug are different from the terms and code tokens used in source files. To address that, we present a novel approach that uses deep neural network (DNN) in combination with rVSM, an information retrieval (IR) technique. rVSM collects the feature on the textual similarity between bug reports and source files. DNN is used to learn to relate the terms in bug reports to potentially different code tokens and terms in source files. Our empirical evaluation on real-world bug reports in the open-source projects shows that DNN and IR complement well to each other to achieve higher bug localization accuracy than individual models. Importantly, our new model, DnnLoc, with a combination of the features built from DNN, rVSM, and project's bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques. In half of the cases, it is correct with just a single suggested file. In 66% of the time, a correct buggy file is in the list of three suggested files. With 5 suggested files, it is correct in almost 70% of the cases.

References

[1]
E. Murphy-Hill, T. Zimmermann, C. Bird, and N. Nagappan, "The design of bug fixes," in Proceedings of the 35th International Conference on Software Engineering (ICSE 2013). IEEE, May 2013.
[2]
R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. C. van Gemund, "A practical evaluation of spectrum-based fault localization," J. Syst. Softw., vol. 82, no. 11, pp. 1780--1792, Nov. 2009.
[3]
J. A. Jones and M. J. Harrold, "Empirical evaluation of the tarantula automatic fault-localization technique," in Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE '05. ACM, 2005, pp. 273--282.
[4]
J. A. Jones, M. J. Harrold, and J. Stasko, "Visualization of test information to assist fault localization," in Proceedings of the 24th International Conference on Software Engineering, ser. ICSE '02. ACM, 2002, pp. 467--477.
[5]
B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, "Scalable statistical bug isolation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '05. ACM, 2005, pp. 15--26.
[6]
C. Liu, L. Fei, X. Yan, J. Han, S. Member, and S. P. Midkiff, "Statistical debugging: A hypothesis testing-based approach," IEEE Transaction on Software Engineering, vol. 32, pp. 831--848, 2006.
[7]
S. K. Lukins, N. A. Kraft, and L. H. Etzkorn, "Bug localization using latent dirichlet allocation," Inf. Softw. Technol., vol. 52, no. 9, pp. 972--990, Sep. 2010.
[8]
S. Rao and A. Kak, "Retrieval from software libraries for bug localization: A comparative study of generic and composite text models," in Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR '11. ACM, 2011, pp. 43--52.
[9]
X. Ye, R. Bunescu, and C. Liu, "Learning to rank relevant files for bug reports using domain knowledge," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. ACM, 2014, pp. 689--699.
[10]
J. Zhou, H. Zhang, and D. Lo, "Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports," in Proceedings of the 34th International Conference on Software Engineering, ser. ICSE '12. IEEE Press, 2012, pp. 14--24.
[11]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, vol. 41, no. 6, pp. 391--407, 1990.
[12]
H. U. Asuncion, A. U. Asuncion, and R. N. Taylor, "Software traceability with topic modeling," in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ser. ICSE'10. ACM, 2010, pp. 95--104.
[13]
A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen, "A topic-based approach for narrowing the search space of buggy files from a bug report," in Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE'11. IEEE CS, 2011, pp. 263--272.
[14]
D. Kim, Y. Tao, S. Kim, and A. Zeller, "Where should we fix this bug? a two-phase recommendation model," IEEE Transactions on Software Engineering, vol. 39, no. 11, pp. 1597--1610, 2013.
[15]
N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann, "What makes a good bug report?" in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. SIGSOFT '08/FSE-16. ACM, 2008, pp. 308--318.
[16]
Y. Bengio, Foundations and Trends in Machine Learning - Learning Deep Architectures for AI. NOW, the essence of knowledge, 2009.
[17]
http://en.wikipedia.org/wiki/Deep_learning.
[18]
http://en.wikipedia.org/wiki/Artificial_neural_network.
[19]
L. Deng and D. Yu, Deep Learning Methods and Applications - Foundations and trends in signal processing. USA: NOW, 2014.
[20]
E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Deep neural network language models," in Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, ser. WLM '12. Association for Computational Linguistics, 2012, pp. 20--28.
[21]
http://en.wikipedia.org/wiki/Tf-idf.
[22]
S. Kim, T. Zimmermann, E. J. Whitehead, and A. Zeller, "Predicting faults from cached history," in Proceedings of the 29th international conference on Software Engineering (ICSE'07). IEEE CS, 2007, pp. 489--498.
[23]
S. Wang, D. Lo, and J. Lawall, "Compositional vector space models for improved bug localization," in Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014.
[24]
B. Ashok, J. Joy, H. Liang, S. K. Rajamani, G. Srinivasa, and V. Vangala, "Debugadvisor: A recommender system for debugging," in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ser. ESEC/FSE '09. ACM, 2009, pp. 373--382.
[25]
D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker, "Using natural language program analysis to locate and understand action-oriented concerns," in Proceedings of the 6th International Conference on Aspect-oriented Software Development, ser. AOSD '07. ACM, 2007, pp. 212--224.
[26]
L. Moreno, J. J. Treadway, A. Marcus, and W. Shen, "On the use of stack traces to improve text retrieval-based bug localization," in IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014.
[27]
C.-P. Wong, Y. Xiong, H. Zhang, D. Hao, L. Zhang, and H. Mei, "Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis," in Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014, pp. 181--190.
[28]
R. Saha, M. Lease, S. Khurshid, and D. Perry, "Improving bug localization using structured information retrieval," in Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE'13). IEEE CS, 2013, pp. 345--355.
[29]
R. K. Saha, J. Lawall, S. Khurshid, and D. E. Perry, "On the effectiveness of information retrieval based bug localization for c programs," in Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME'14). IEEE CS, 2014.
[30]
H. Cleve and A. Zeller, "Locating causes of program failures," in Proceedings of the 27th International Conference on Software Engineering, ser. ICSE '05. ACM, 2005, pp. 342--351.
[31]
X. Ren, F. Shah, F. Tip, B. G. Ryder, and O. Chesley, "Chianti: A tool for change impact analysis of java programs," in Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, ser. OOPSLA '04. ACM, 2004, pp. 432--448.
[32]
O. C. Chesley, X. Ren, B. G. Ryder, and F. Tip, "Crisp-a fault localization tool for java programs," in Proceedings of the 29th International Conference on Software Engineering, ser. ICSE '07. IEEE Computer Society, 2007, pp. 775--779.
[33]
Y. Brun and M. D. Ernst, "Finding latent code errors via machine learning over program executions," in Proceedings of the 26th International Conference on Software Engineering, ser. ICSE '04. IEEE Computer Society, 2004, pp. 480--490.
[34]
M. Weiser, "Programmers use slices when debugging," Commun. ACM, vol. 25, no. 7, pp. 446--452, Jul. 1982.
[35]
R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang, "Pse: Explaining program failures via postmortem static analysis," in Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering, ser. SIGSOFT '04/FSE-12. ACM, 2004, pp. 63--72.
[36]
M. Acharya and B. Robinson, "Practical change impact analysis based on static program slicing for industrial software systems," in Proceedings of the 33rd International Conference on Software Engineering, ser. ICSE '11. ACM, 2011, pp. 746--755.
[37]
S. Kim, E. J. Whitehead, Jr., and Y. Zhang, "Classifying software changes: Clean or buggy?" IEEE Trans. Softw. Eng., vol. 34, no. 2, pp. 181--196, Mar. 2008. {Online}. Available
[38]
A. E. Hassan, "Predicting faults using the complexity of code changes," in Proceedings of the 31st International Conference on Software Engineering, ser. ICSE '09. Washington, DC, USA: IEEE Computer Society, 2009, pp. 78--88. {Online}. Available
[39]
R. Moser, W. Pedrycz, and G. Succi, "A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction," in Proceedings of the 30th International Conference on Software Engineering, ser. ICSE '08. New York, NY, USA: ACM, 2008, pp. 181--190. {Online}. Available
[40]
M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: A benchmark and an extensive comparison," Empirical Softw. Engg., vol. 17, no. 4--5, pp. 531--577, Aug. 2012.
[41]
A. N. Lam, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "Combining deep learning with information retrieval to localize buggy files for bug reports (n)," in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE'15), ser. ASE '15. IEEE CS, 2015, pp. 476--481. {Online}. Available
[42]
https://deeplearning4j.org/.
[43]
T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen, "A statistical semantic language model for source code," in Proceedings of the 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. ACM, 2013, pp. 532--542.
[44]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software," in Proceedings of the 2012 International Conference on Software Engineering, ser. ICSE 2012. IEEE Press, 2012, pp. 837--847.
[45]
V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI '14. ACM, 2014, pp. 419--428.
[46]
A. T. Nguyen and T. N. Nguyen, "Graph-based statistical language model for code," in Proceedings of the 37th International Conference on Software Engineering, ser. ICSE 2015. IEEE CS, 2015.
[47]
A. T. Nguyen, M. Hilton, M. Codoban, H. A. Nguyen, L. Mast, E. Rademacher, T. N. Nguyen, and D. Dig, "API code recommendation using statistical learning from fine-grained changes," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. ACM, 2016, pp. 511--522. {Online}. Available
[48]
A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, "Divide-and-conquer approach for multi-phase statistical migration for source code (t)," in Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), ser. ASE '15. IEEE CS, 2015, pp. 585--596. {Online}. Available
[49]
M. Raghothaman, Y. Wei, and Y. Hamadi, "SWIM: synthesizing what I mean," in Proceedings of the 38th International Conference on Software Engineering, ser. ICSE 2016. ACM Press, 2016.
[50]
T. Nguyen, P. C. Rigby, A. T. Nguyen, M. Karanfil, and T. N. Nguyen, "T2API: Synthesizing API code usage templates from english texts with statistical translation," in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. ACM, 2016, pp. 1013--1017. {Online}. Available

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC '17: Proceedings of the 25th International Conference on Program Comprehension
May 2017
399 pages
ISBN:9781538605356

Sponsors

Publisher

IEEE Press

Publication History

Published: 20 May 2017

Check for updates

Author Tags

  1. bug localization
  2. code retrieval
  3. deep learning

Qualifiers

  • Research-article

Conference

ICSE '17
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media