skip to main content
research-article

Towards Exploring the Limitations of Test Selection Techniques on Graph Neural Networks: An Empirical Study

Published: 22 July 2024 Publication History

Abstract

Graph Neural Networks (GNNs) have gained prominence in various domains, such as social network analysis, recommendation systems, and drug discovery, due to their ability to model complex relationships in graph-structured data. GNNs can exhibit incorrect behavior, resulting in severe consequences. Therefore, testing is necessary and pivotal. However, labeling all test inputs for GNNs can be prohibitively costly and time-consuming, especially when dealing with large and complex graphs. In response to these challenges, test selection has emerged as a strategic approach to alleviate labeling expenses. The objective of test selection is to select a subset of tests from the complete test set. While various test selection techniques have been proposed for traditional deep neural networks (DNNs), their adaptation to GNNs presents unique challenges due to the distinctions between DNN and GNN test data. Specifically, DNN test inputs are independent of each other, whereas GNN test inputs (nodes) exhibit intricate interdependencies. Therefore, it remains unclear whether DNN test selection approaches can perform effectively on GNNs. To fill the gap, we conduct an empirical study that systematically evaluates the effectiveness of various test selection methods in the context of GNNs, focusing on three critical aspects: 1) Misclassification detection: selecting test inputs that are more likely to be misclassified; 2) Accuracy estimation: selecting a small set of tests to precisely estimate the accuracy of the whole testing set; 3) Performance enhancement: selecting retraining inputs to improve the GNN accuracy. Our empirical study encompasses 7 graph datasets and 8 GNN models, evaluating 22 test selection approaches. Our study includes not only node classification datasets but also graph classification datasets. Our findings reveal that: 1) In GNN misclassification detection, confidence-based test selection methods, which perform well in DNNs, do not demonstrate the same level of effectiveness; 2) In terms of GNN accuracy estimation, clustering-based methods, while consistently performing better than random selection, provide only slight improvements; 3) Regarding selecting inputs for GNN performance improvement, test selection methods, such as confidence-based and clustering-based test selection methods, demonstrate only slight effectiveness; 4) Concerning performance enhancement, node importance-based test selection methods are not suitable, and in many cases, they even perform worse than random selection.

References

[1]
Aghababaeyan Z, Abdellatif M, Briand L, Ramesh S, Bagherzadeh M (2023a) Black-box testing of deep neural networks through test case diversity. IEEE Trans Softw Eng, IEEE
[2]
Aghababaeyan Z, Abdellatif M, Dadkhah M, Briand L (2023b) Deepgd: A multi-objective black-box test selection approach for deep neural networks. arXiv:2303.04878
[3]
Ahmed M, Seraj R, and Islam SMS The k-means algorithm: A comprehensive survey and performance evaluation Electronics, MDPI 2020 9 8 1295
[4]
Ali PJM, Faraj RH, Koya E, Ali PJM, and Faraj RH Data normalization and standardization: a technical report Machine Learning Technical Reports 2014 1 1 1-6
[5]
Ando H, Bell M, Kurauchi F, Wong KI, and Cheung KF Connectivity evaluation of large road network by capacity-weighted eigenvector centrality analysis Transportmetrica A: Transport Science, Taylor & Francis 2021 17 4 648-674
[6]
Arthur D, Vassilvitskii S (2007) K-means++ the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, ACM New York, NY, USA, pp 1027–1035
[7]
Bianchi FM, Grattarola D, Livi L, and Alippi C Graph neural networks with convolutional arma filters IEEE Trans Pattern Anal Mach Intell, IEEE 2021 44 7 3496-3507
[8]
Bongini P, Bianchini M, and Scarselli F Molecular generative graph neural networks for drug discovery Neurocomputing, Elsevier 2021 450 242-252
[9]
Cai H, Zheng VW, and Chang KCC A comprehensive survey of graph embedding: Problems, techniques, and applications IEEE Trans Knowl Data Eng, IEEE 2018 30 9 1616-1637
[10]
Chen J, Schein A, Ungar L, Palmer M (2006) An empirical study of the behavior of active learning for word sense disambiguation. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, ACM New York, NY, pp 120–127
[11]
Chen J, Wu Z, Wang Z, You H, Zhang L, Yan M (2020) Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology (TOSEM), ACM New York, NY, USA 29(4):1–35
[12]
Cheng X, Wang H, Hua J, Xu G, Sui Y (2021) Deepwukong: Statically detecting software vulnerabilities using deep graph neural network. ACM Transactions on Software Engineering and Methodology (TOSEM), ACM New York, NY, USA 30(3):1–33
[13]
Cheng X, Zhang G, Wang H, Sui Y (2022) Path-sensitive code embedding via contrastive learning for software vulnerability detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM New York, NY, USA, pp 519–531
[14]
Dang X, Li Y, Papadakis M, Klein J, Bissyandé TF, and Le Traon Y Graphprior: mutation-based test input prioritization for graph neural networks ACM Trans Softw Eng Methodol, ACM New York, NY, USA 2023 33 1 1-40
[15]
Dang X, Li Y, Papadakis M, Klein J, Bissyandé TF, and Le Traon Y Test input prioritization for machine learning classifiers 2024 IEEE IEEE Transactions on Software Engineering
[16]
Du J, Zhang S, Wu G, Moura JM, Kar S (2017) Topology adaptive graph convolutional networks. arXiv:1710.10370
[17]
Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, and Adams RP Convolutional networks on graphs for learning molecular fingerprints 2015 NY Advances in neural information processing systems, ACM New York 28
[18]
Dwivedi VP, Joshi CK, Luu AT, Laurent T, Bengio Y, Bresson X (2020) Benchmarking graph neural networks. arXiv:2003.00982
[19]
Elbaum S, Malishevsky AG, and Rothermel G Test case prioritization: A family of empirical studies IEEE Trans Softw Eng, IEEE 2002 28 2 159-182
[20]
Fan W, Ma Y, Li Q, He Y, Zhao E, Tang J, and Yin D Graph neural networks for social recommendation 2019 NY The world wide web conference. ACM New York 417-426
[21]
Feng Y, Shi Q, Gao X, Wan J, Fang C, Chen Z (2020) Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM New York, NY, pp 177–188
[22]
Fu X, Zhang J, Meng Z, King I (2020) Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In: Proceedings of The Web Conference 2020, ACM New York, NY, pp 2331–2341
[23]
Gao X, Feng Y, Yin Y, Liu Z, Chen Z, Xu B (2022) Adaptive test selection for deep neural networks. In: Proceedings of the 44th International Conference on Software Engineering, IEEE, pp 73–85
[24]
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, PMLR, pp 1263–1272
[25]
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in neural information processing systems, Curran Associates, p 30
[26]
Haq FU, Shin D, Nejati S, Briand L (2021) Can offline testing of deep neural networks replace their online testing? a case study of automated driving systems. Empirical Software Engineering, Springer, 26(5):90
[27]
He X, Deng K, Wang X, Li Y, Zhang Y, Wang M (2020) Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, ACM New York, NY, pp 639–648
[28]
Hong D, Gao L, Yao J, Zhang B, Plaza A, and Chanussot J Graph convolutional networks for hyperspectral image classification IEEE Trans Geosci Remote Sens, IEEE 2020 59 7 5966-5978
[29]
Hu P, Fan W, and Mei S Identifying node importance in complex networks Physica A: Statistical Mechanics and its Applications, Elsevier 2015 429 169-176
[30]
Hu Q, Guo Y, Cordy M, Xie X, Ma W, Papadakis M, Le Traon Y (2021) Towards exploring the limitations of active learning: An empirical study. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 917–929
[31]
Jahangirova G and Tonella P An empirical evaluation of mutation operators for deep learning systems 2020 IEEE 13th International Conference on Software Testing 2020 IEEE Validation and Verification (ICST) 74-84
[32]
Jha K, Saha S, and Singh H Prediction of protein-protein interaction using graph neural networks Scientific Reports, Nature Publishing Group UK London 2022 12 1 8360
[33]
Jin W, Ma Y, Liu X, Tang X, Wang S, Tang J (2020) Graph structure learning for robust graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, ACM New York, NY, pp 66–74
[34]
Kaushik M and Mathur B Comparative study of k-means and hierarchical clustering techniques International Journal of Software & Hardware Research in Engineering, iJournals 2014 2 6 93-98
[35]
Kim B, Khanna R, and Koyejo OO Examples are not enough, learn to criticize! criticism for interpretability 2016 NY Advances in neural information processing systems, ACM New York 29
[36]
Kim J, Feldt R, Yoo S (2019) Guiding deep learning system testing using surprise adequacy. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 1039–1049
[37]
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
[38]
Li C, Ma J, Guo X, Mei Q (2017) Deepcas: An end-to-end predictor of information cascades. In: Proceedings of the 26th international conference on World Wide Web, ACM New York, NY, pp 577–586
[39]
Li Y, Dang X, Tian H, Sun T, Wang Z, Ma L, Klein J, Bissyande TF (2022) Ai-driven mobile apps: an explorative study. arXiv:2212.01635
[40]
Li Y, Dang X, Ma L, Klein J, Traon YL, and Bissyandé TF Test input prioritization for 3d point clouds 2023 ACM New York, NY ACM Transactions on Software Engineering and Methodology
[41]
Li Z, Ma X, Xu C, Cao C, Xu J, Lü J (2019) Boosting operational dnn testing efficiency through conditioning. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM New York, NY, pp 499–509
[42]
Liu M, Gao H, Ji S (2020) Towards deeper graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, ACM New York, NY, pp 338–348
[43]
Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) Bindingdb: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic acids research, Oxford University Press 35(suppl_1):D198–D201
[44]
Long Y, Wu M, Liu Y, Fang Y, Kwoh CK, Chen J, Luo J, and Li X Pre-training graph neural networks for link prediction in biomedical networks Bioinformatics, Oxford University Press 2022 38 8 2254-2262
[45]
Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, et al. (2018) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE Int Autom Softw Eng Conf, ACM New York, NY, pp 120–131
[46]
Ma W, Papadakis M, Tsakmalis A, Cordy M, Traon YL (2021) Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology (TOSEM), ACM New York, NY, USA, 30(2):1–22
[47]
Mesquita D, Souza A, and Kaski S Rethinking pooling in graph neural networks Advances in Neural Information Processing Systems, ACM New York, NY 2020 33 2220-2231
[48]
Morris C, Ritzert M, Fey M, Hamilton WL, Lenssen JE, Rattan G, and Grohe M Weisfeiler and leman go neural: Higher-order graph neural networks Proceedings of the AAAI conference on artificial intelligence, ACM New York, NY 2019 33 4602-4609
[49]
Neumann M, Garnett R, Bauckhage C, and Kersting K Propagation kernels: efficient graph kernels from propagated information Machine learning, Springer 2016 102 209-245
[50]
Panichella A, Kifetew FM, and Tonella P Automated test case generation as a many-objective optimisation problem with dynamic selection of the targets IEEE Trans Softw Eng, IEEE 2017 44 2 122-158
[51]
Park N, Kan A, Dong XL, Zhao T, Faloutsos C (2019) Estimating node importance in knowledge graphs using graph neural networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, ACM New York, NY, pp 596–606
[52]
Patel E and Kushwaha DS Clustering cloud workloads: K-means vs gaussian mixture model Procedia computer science, Elsevier 2020 171 158-167
[53]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research JMLR org 2011 12 2825-2830
[54]
Pei K, Cao Y, Yang J, Jana S (2017) Deepxplore: Automated whitebox testing of deep learning systems. In: proceedings of the 26th Symposium on Operating Systems Principles, ACM New York, NY, pp 1–18
[55]
Qiong Q, Dongxia W (2016) Evaluation method for node importance in complex networks based on eccentricity of node. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC), IEEE, pp 2499–2502
[56]
Ranganathan H, Venkateswara H, Chakraborty S, Panchanathan S (2017) Deep active learning for image classification. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3934–3938
[57]
Réau M, Renaud N, Xue LC, Bonvin AM (2023) Deeprank-gnn: a graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics, Oxford University Press, 39(1):btac759
[58]
Ren P, Xiao Y, Chang X, Huang PY, Li Z, Gupta BB, Chen X, Wang X (2021) A survey of deep active learning. ACM computing surveys (CSUR), ACM New York, NY, 54(9):1–40
[59]
Riesen K, Bunke H (2008) Iam graph database repository for graph based pattern recognition and machine learning. In: Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008. Proceedings, Springer, pp 287–297
[60]
Sassano M (2002) An empirical study of active learning with support vector machines forjapanese word segmentation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 505–512
[61]
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on World wide web, ACM New York, NY, pp 1177–1178
[62]
Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI magazine, AAAI 29(3):93–93
[63]
Shen W, Li Y, Chen L, Han Y, Zhou Y, Xu B (2020) Multiple-boundary clustering and prioritization to promote neural network retraining. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, IEEE, pp 410–422
[64]
Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, JMLR, 12(9)
[65]
Shi C, Xu M, Zhu Z, Zhang W, Zhang M, Tang J (2020) Graphaf: a flow-based autoregressive model for molecular graph generation. arXiv:2001.09382
[66]
Sun C, Shrivastava A, Vondrick C, Sukthankar R, Murphy K, Schmid C (2019) Relational action forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, pp 273–283
[67]
Thekumparampil KK, Wang C, Oh S, Li LJ (2018) Attention-based graph neural network for semi-supervised learning. arXiv:1803.03735
[68]
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
[69]
Wang D, Shang Y (2014) A new active labeling method for deep learning. In: 2014 International joint conference on neural networks (IJCNN), IEEE, pp 112–119
[70]
Wang Z, You H, Chen J, Zhang Y, Dong X, Zhang W (2021) Prioritizing test inputs for deep neural networks via mutation analysis. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp 397–409
[71]
Weiss M, Tonella P (2022) Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM New York, NY, pp 139–150
[72]
Wieder O, Kohlbacher S, Kuenemann M, Garon A, Ducrot P, Seidel T, and Langer T A compact review of molecular property prediction with graph neural networks Drug Discovery Today: Technologies, Elsevier 2020 37 1-12
[73]
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z (2018) Drugbank 5.0: a major update to the drugbank database for, et al (2018) Nucleic acids research. Oxford University Press 46(D1):D1074–D1082
[74]
Wu L, Sun P, Hong R, Fu Y, Wang X, Wang M (2018) Socialgcn: An efficient graph convolutional network based model for social recommendation. arXiv:1811.02815
[75]
Wu S, Sun F, Zhang W, Xie X, Cui B (2022) Graph neural networks in recommender systems: a survey. ACM Computing Surveys, ACM New York, NY 55(5):1–37
[76]
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. Trans Neural Netw Learn Syst, IEEE 32(1):4–24
[77]
Xiao S, Wang S, Dai Y, and Guo W Graph neural networks in node classification: survey and evaluation Machine Vision and Applications, Springer 2022 33 1-19
[78]
Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv:1810.00826
[79]
Yang Y, Yu L, Wang X, Zhou Z, Chen Y, and Kou T A novel method to evaluate node importance in complex networks Phys A: Stat Mech Appl, Elsevier 2019 526 121118
[80]
Yang Z, Cohen W, Salakhudinov R (2016) Revisiting semi-supervised learning with graph embeddings. International conference on machine learning. PMLR, ACM New York, NY, pp 40–48
[81]
Ying R, He R, Chen K, Eksombatchai P, Hamilton WL, Leskovec J (2018) Graph convolutional neural networks for web-scale recommender systems. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM New York, NY, pp 974–983
[82]
Yu J, Yin H, Li J, Gao M, Huang Z, and Cui L Enhance social recommendation with adversarial graph convolutional networks 2020 IEEE IEEE Trans Knowl Data Eng
[83]
Yu Z, Kraft NA, and Menzies T Finding better active learners for faster literature reviews Empir Softw Eng, Springer 2018 23 3161-3186
[84]
Zhang XM, Liang L, Liu L, and Tang MJ Graph neural networks and their current applications in bioinformatics Frontiers in genetics, Frontiers Media SA 2021 12
[85]
Zhao T, Zhang X, Wang S (2021) Graphsmote: Imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM international conference on web search and data mining, ACM New York, NY, pp 833–841
[86]
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, and Sun M Graph neural networks: A review of methods and applications AI Open, Elsevier 2020 1 57-81
[87]
Zolfagharian A, Abdellatif M, Briand LC, Bagherzadeh M, and Ramesh S A search-based testing approach for deep reinforcement learning agents 2023 IEEE IEEE Transactions on Software Engineering

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 29, Issue 5
Sep 2024
1352 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 22 July 2024
Accepted: 17 June 2024

Author Tags

  1. Graph neural networks
  2. Deep learning testing
  3. Test input selection
  4. Labeling

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media