skip to main content
10.1145/3540250.3549175acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

AutoPruner: transformer-based call graph pruning

Published: 09 November 2022 Publication History

Abstract

Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives.
In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis. Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics. Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13% in identifying false-positive edges in a static call graph. Moreover, AutoPruner achieves improvements on two client analyses, including halving the false alarm rate on null pointer analysis and over 10% improvements on monomorphic call-site detection. Additionally, our ablation study and qualitative analysis show that the semantic features extracted by AutoPruner capture a remarkable amount of information for distinguishing between true and false positives.

References

[1]
Karim Ali and Ondřej Lhoták. 2012. Application-only call graph construction. In European Conference on Object-Oriented Programming. 688–712.
[2]
Miltiadis Allamanis, Earl T Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st acm sigplan conference on programming language design and implementation. 91–105.
[3]
Leonhard Applis, Annibale Panichella, and Arie van Deursen. 2021. Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1377–1381.
[4]
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262.
[5]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33 (2020), 1877–1901.
[6]
Nghi DQ Bui, Yijun Yu, and Lingxiao Jiang. 2021. Infercode: Self-supervised learning of code representations by predicting subtrees. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1186–1197.
[7]
David Callahan, Alan Carle, Mary W. Hall, and Ken Kennedy. 1990. Constructing the procedure call multigraph. IEEE Transactions on Software Engineering, 16, 4 (1990), 483–487.
[8]
Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering, 47, 9 (2019), 1943–1959.
[9]
Maria Christakis and Christian Bird. 2016. What developers want and need from program analysis: an empirical study. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering. 332–343.
[10]
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20, 1 (1960), 37–46.
[11]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547.
[12]
Stephen Fink and Julian Dolby. 2012. WALA–The TJ Watson Libraries for Analysis.
[13]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
[14]
Andrew Habib and Michael Pradel. 2018. How many of all bugs do we find? a study of static bug detectors. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 317–328.
[15]
Joseph Hejderup, Arie van Deursen, and Georgios Gousios. 2018. Software ecosystem call graph for dependency management. In 2018 IEEE/ACM 40th International Conference on Software Engineering: New Ideas and Emerging Technologies Results (ICSE-NIER). 101–104.
[16]
Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152–162.
[17]
Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM, 59, 5 (2016), 122–131.
[18]
Laurent Hubert, Thomas Jensen, and David Pichardie. 2008. Semantic foundations and inference of non-null annotations. In International Conference on Formal Methods for Open Object-Based Distributed Systems. 132–149.
[19]
Emanuele Iannone, Dario Di Nucci, Antonino Sabetta, and Andrea De Lucia. 2021. Toward Automated Exploit Generation for Known Vulnerabilities in Open-Source Libraries. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 396–400.
[20]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs? In 2013 35th International Conference on Software Engineering (ICSE). 672–681.
[21]
Milod Kazerounian, Jeffrey S Foster, and Bonan Min. 2021. SimTyper: sound type inference for Ruby using type equality prediction. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–27.
[22]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171–4186.
[23]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
[24]
Pavneet Singh Kochhar, David Lo, Julia Lawall, and Nachiappan Nagappan. 2017. Code coverage and postrelease defects: A large-scale study on open source projects. IEEE Transactions on Reliability, 66, 4 (2017), 1213–1228.
[25]
J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics, 159–174.
[26]
Thanh Le-Cong, Kang Hong Jin, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan Bach Dinh Le, and Quyet Thang Huynh. 2022. Artifact for "AutoPruner: Tranformer-based Call Graph Pruning". In 2022 the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.5281/zenodo.6369874
[27]
Thanh Le-Cong, Kang Hong Jin, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan Bach Dinh Le, and Quyet Thang Huynh. 2022. Source code for "AutoPruner: Tranformer-based Call Graph Pruning". https://github.com/soarsmu/AutoPruner/
[28]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 795–806.
[29]
Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. Scalability-first pointer analysis with self-tuning context-sensitivity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 129–140.
[30]
Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664–676.
[31]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.
[32]
Ravi Mangal, Xin Zhang, Aditya V Nori, and Mayur Naik. 2015. A user-guided approach to program analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 462–473.
[33]
Ehsan Mashhadi and Hadi Hemmati. 2021. Applying codebert for automated program repair of java simple bugs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 505–509.
[34]
Gail C Murphy, David Notkin, William G Griswold, and Erica S Lan. 1998. An empirical study of static call graph extractors. ACM Transactions on Software Engineering and Methodology (TOSEM), 7, 2 (1998), 158–191.
[35]
Thanh-Dat Nguyen, Thanh Le-Cong, Duc-Ming Luong, Van-Hai Duong, Xuan Bach Le Dinh, David Lo, and Thang Huynh-Quyet. 2022. FFL: Fine grained Fault Localization for Student Programs via Syntactic and Semantic Reasoning. In Proceedings of the 38th IEEE International Conference on Software Maintenance and Evolution.
[36]
Giang Nguyen-Truong, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach Dinh Le, and David Lo. 2022. VulCurator: A Vulnerability-Fixing Commit Detector. In Proceedings of the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
[37]
Benjamin Barslev Nielsen, Martin Toldam Torp, and Anders Møller. 2021. Modular call graph construction for security scanning of Node. js applications. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 29–41.
[38]
Jens Palsberg and Cristina V Lopes. 2018. NJR: A normalized Java resource. In Companion Proceedings for the ISSTA/ECOOP 2018 Workshops. 100–106.
[39]
Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: a hybrid type inference approach for python. In Proceedings of the 44th International Conference on Software Engineering. 2019–2030.
[40]
Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.
[41]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8 (2019), 9.
[42]
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. 419–428.
[43]
Michael Reif, Florian Kübler, Michael Eichberg, Dominik Helm, and Mira Mezini. 2019. Judge: Identifying, understanding, and evaluating sources of unsoundness in call graphs. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 251–261.
[44]
Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical society, 74, 2 (1953), 358–366.
[45]
Barbara G Ryder. 1979. Constructing the call graph of a program. IEEE Transactions on Software Engineering, 216–226.
[46]
Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V Lopes. 2018. Oreo: Detection of clones in the twilight zone. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 354–365.
[47]
Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. 2021. Pycg: Practical call graph generation in python. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1646–1657.
[48]
Li Sui, Jens Dietrich, Amjed Tahir, and George Fourtounis. 2020. On the recall of static call graph construction in practice. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 1049–1060.
[49]
Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: AI-assisted code completion system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2727–2735.
[50]
Tian Tan, Yue Li, and Jingling Xue. 2016. Making k-object-sensitive pointer analysis more precise with still k-limiting. In International Static Analysis Symposium. 489–510.
[51]
David A Tomassi and Cindy Rubio-González. 2021. On the Real-World Effectiveness of Static Bug Detectors at Finding Null Pointer Exceptions. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 292–303.
[52]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.
[53]
Akshay Utture, Shuyang Liu, Christian Gram Kalhauge, and Jens Palsberg. 2022. Striking a Balance: Pruning False-Positives from Static Call Graphs. The 44rd IEEE/ACM International Conference on Software Engineering (ICSE 2022).
[54]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, 9, 11 (2008).
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30 (2017).
[56]
Max Welling and Thomas N Kipf. 2016. Semi-supervised classification with graph convolutional networks. In J. International Conference on Learning Representations (ICLR 2017).
[57]
Tao Xie and David Notkin. 2002. An empirical study of java dynamic call graph extractors. University of Washington CSE Technical Report, 02–12.
[58]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32 (2019).
[59]
Jiayuan Zhou, Michael Pacheco, Zhiyuan Wan, Xin Xia, David Lo, Yuan Wang, and Ahmed E Hassan. 2021. Finding A Needle in a Haystack: Automated Mining of Silent Vulnerability Fixes. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 705–716.
[60]
Xin Zhou, DongGyun Han, and David Lo. 2021. Assessing Generalizability of CodeBERT. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 425–436.
[61]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE, 109, 1 (2020), 43–76.
[62]
Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. 2021. A Robustly Optimized BERT Pre-training Approach with Post-training. In Proceedings of the 20th Chinese National Conference on Computational Linguistics. Chinese Information Processing Society of China, Huhhot, China. 1218–1227. https://aclanthology.org/2021.ccl-1.108

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Call Graph Pruning
  2. Pretrained Language Model
  3. Static Analysis
  4. Transformer

Qualifiers

  • Research-article

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)19
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media