skip to main content
10.1145/3210459.3210469acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

Bug Localization with Semantic and Structural Features using Convolutional Neural Network and Cascade Forest

Published: 28 June 2018 Publication History

Abstract

Background: Correctly localizing buggy files for bug reports together with their semantic and structural information is a crucial task, which would essentially improve the accuracy of bug localization techniques. Aims: To empirically evaluate and demonstrate the effects of both semantic and structural information in bug reports and source files on improving the performance of bug localization, we propose CNN_Forest involving convolutional neural network and ensemble of random forests that have excellent performance in the tasks of semantic parsing and structural information extraction. Method: We first employ convolutional neural network with multiple filters and an ensemble of random forests with multi-grained scanning to extract semantic and structural features from the word vectors derived from bug reports and source files. And a subsequent cascade forest (a cascade of ensembles of random forests) is used to further extract deeper features and observe the correlated relationships between bug reports and source files. CNNLForest is then empirically evaluated over 10,754 bug reports extracted from AspectJ, Eclipse UI, JDT, SWT, and Tomcat projects. Results: The experiments empirically demonstrate the significance of including semantic and structural information in bug localization, and further show that the proposed CNN_Forest achieves higher Mean Average Precision and Mean Reciprocal Rank measures than the best results of the four current state-of-the-art approaches (NPCNN, LR+WE, DNNLOC, and BugLocator). Conclusion: CNNLForest is capable of defining the correlated relationships between bug reports and source files, and we empirically show that semantic and structural information in bug reports and source files are crucial in improving bug localization.

References

[1]
Dave Binkley, Marcia Davis, Dawn Lawrie, and Christopher Morrell. 2009. To camelcase or underscore. In Program Comprehension, 2009. ICPC'09. IEEE 17th International Conference on. IEEE, 158--167.
[2]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022.
[3]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[4]
Bernd Bruegge and Allen H Dutoit. 2004. Object-oriented software engineering using UML, patterns and Java-(Required). Prentice Hall.
[5]
Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Aditya Ghose, and Tim Menzies. 2016. A deep learning model for estimating story points. arXiv preprint arXiv:1609.00489 (2016).
[6]
Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 3642--3649.
[7]
Christopher Clark and Amos Storkey. 2015. Training deep convolutional neural networks to play go. In International Conference on Machine Learning. 1766--1774.
[8]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167.
[9]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537.
[10]
Yoav Freund and Robert E Schapire. 1995. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory. Springer, 23--37.
[11]
Kunihiko Fukushima and Sei Miyake. 1982. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and Cooperation in Neural Nets. Springer, 267--285.
[12]
Gregory Gay, Sonia Haiduc, Andrian Marcus, and Tim Menzies. 2009. On the use of relevance feedback in IR-based concept location. In Software Maintenance, 2009. ICSM 2009. IEEE International Conference on. IEEE, 351--360.
[13]
Edward Grefenstette, Phil Blunsom, Nando de Freitas, and Karl Moritz Hermann. 2014. A deep architecture for semantic parsing. arXiv preprint arXiv:1404.7296 (2014).
[14]
Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE, 837--847.
[15]
David H Hubel and Torsten N Wiesel. 1962. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology 160, 1 (1962), 106--154.
[16]
Xuan Huo, Ming Li, and Zhi-Hua Zhou. 2016. Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code. In International Joint Conference on Artificial Intelligence (IJCAI). 1606--1612.
[17]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35, 1(2013), 221--231.
[18]
Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058 (2014).
[19]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[21]
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2015. Combining deep learning with information retrieval to localize buggy files for bug reports (n). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 476--481.
[22]
An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the 25th International Conference on Program Comprehension. IEEE Press, 218--229.
[23]
Thomas D LaToza and Brad A Myers. 2010. Hard-to-answer questions about code. In Evaluation and Usability of Programming Languages and Tools. ACM, 8.
[24]
Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. 2008. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering 34, 4 (2008), 485--496.
[25]
Ziyi Lin, Hao Zhong, Yuting Chen, and Jianjun Zhao. 2016. Lock-Peeker: detecting latent locks in Java APIs. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 368--378.
[26]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 413--422.
[27]
Stacy K Lukins, Nicholas A Kraft, and Letha H Etzkorn. 2010. Bug localization using latent dirichlet allocation. Information and Software Technology 52, 9 (2010), 972--990.
[28]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[29]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In AAAI. 1287--1293.
[30]
Nasser M Nasrabadi. 2007. Pattern recognition and machine learning. Journal of electronic imaging 16, 4 (2007), 049901.
[31]
Anh Tuan Nguyen and Tien N Nguyen. 2015. Graph-based statistical language model for code. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 1. IEEE, 858--868.
[32]
Anh Tuan Nguyen, Tung Thanh Nguyen, Jafar Al-Kofahi, Hung Viet Nguyen, and Tien N Nguyen. 2011. A topic-based approach for narrowing the search space of buggy files from a bug report. In Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on. IEEE, 263--272.
[33]
Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 70--79.
[34]
John F Pane, Brad A Myers, et al. 2001. Studying the language and structure in non-programmers' solutions to programming problems. International Journal of Human-Computer Studies 54, 2 (2001), 237--264.
[35]
Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building program vector representations for deep learning. In International Conference on Knowledge Science, Engineering and Management. Springer, 547--553.
[36]
Denys Poshyvanyk, Yann-Gael Gueheneuc, Andrian Marcus, Giuliano Antoniol, and Vaclav Rajlich. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering 33, 6 (2007).
[37]
Giuseppe Scanniello and Andrian Marcus. 2011. Clustering support for static concept location in source code. In Program Comprehension (ICPC), 2011 IEEE 19th International Conference on. Ieee, 1--10.
[38]
Giuseppe Scanniello, Andrian Marcus, and Daniele Pascale. 2015. Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering 20, 6 (2015), 1666--1720.
[39]
Bird Steven, Ewan Klein, and Edward Loper. 2009. Natural language processing with python. OReilly Media Inc (2009).
[40]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[41]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. 1. IEEE, I--I.
[42]
Izhar Wallach, Michael Dzamba, and Abraham Heifets. 2015. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint arXiv:1510.02855 (2015).
[43]
Geoffrey I Webb. 2000. Multiboosting: A technique for combining boosting and wagging. Machine learning 40, 2 (2000), 159--196.
[44]
Michael W Whalen, Suzette Person, Neha Rungta, Matt Staats, and Daniela Grijincu. 2015. A flexible and non-intrusive approach for computing complex structural coverage metrics. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 506--516.
[45]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.
[46]
Yan Xiao, Jacky Keung, Qing Mi, and Kwabena E Bennin. 2017. Improving Bug Localization with an Enhanced Convolutional Neural Network. In Asia-Pacific Software Engineering Conference (APSEC), 2017 24th. IEEE, 338--347.
[47]
Xihao Xie, Wen Zhang, Ye Yang, and Qing Wang. 2012. Dretom: Developer recommendation based on topic models for bug resolution. In Proceedings of the 8th international conference on predictive models in software engineering. ACM, 19--28.
[48]
Xin Ye, Razvan Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 689--699.
[49]
Xin Ye, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering. ACM, 404--415.
[50]
Yun Zhang, David Lo, Xin Xia, Tien-Duy B Le, Giuseppe Scanniello, and Jianling Sun. 2016. Inferring links between concerns and methods with multi-abstraction vector space model. In Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on. IEEE, 110--121.
[51]
Yun Zhang, David Lo, Xin Xia, Giuseppe Scanniello, Tien-Duy B Le, and Jianling Sun. 2017. Fusing multi-abstraction vector space models for concern localization. Empirical Software Engineering (2017), 1--44.
[52]
Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 14--24.
[53]
Zhi-Hua Zhou. 2012. Ensemble methods: foundations and algorithms. CRC press.
[54]
Zhi-Hua Zhou and Ji Feng. 2017. Deep forest: Towards an alternative to deep neural networks. arXiv preprint arXiv:1702.08835 (2017).

Cited By

View all
  • (2024)Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug LocalizationProceedings of the Third ACM/IEEE International Workshop on NL-based Software Engineering10.1145/3643787.3648028(1-8)Online publication date: 20-Apr-2024
  • (2024)On Using GUI Interaction Data to Improve Text Retrieval-based Bug LocalizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608139(1-13)Online publication date: 20-May-2024
  • (2024)Fine-Grained Bug Localization Based on Rich Context using Attention Tree-GRU2024 5th International Conference on Computer Engineering and Application (ICCEA)10.1109/ICCEA62105.2024.10603974(640-646)Online publication date: 12-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '18: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018
June 2018
223 pages
ISBN:9781450364034
DOI:10.1145/3210459
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • The University of Canterbury

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bug localization
  2. cascade forest
  3. convolutional neural network
  4. semantic information
  5. structural information
  6. word embedding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EASE'18

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media