skip to main content
10.1145/3589334.3645629acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

When Imbalance Meets Imbalance: Structure-driven Learning for Imbalanced Graph Classification

Published: 13 May 2024 Publication History

Abstract

Graph Neural Networks (GNNs) can learn representative graph-level features to achieve efficient graph classification. But GNNs usually assume an environment where both class and structure distribution are balanced. Although previous works have considered the graph classification problem under the scenario of class imbalance or structure imbalance, they habitually ignored the obvious fact that class imbalance and structural imbalance are often intertwined in the real world. In this paper, we propose a carefully designed structure-driven learning framework called ImbGNN to address the potential intertwined class imbalance and structural imbalance in graph classification. Specifically, we find that feature-oriented augmentation (e.g., feature masking) and structure-oriented augmentation (e.g., edge perturbation) will have differential impacts when applied to different graphs. Therefore, we design optional augmentation based on the average degree distribution to alleviate structural imbalance. Furthermore, based on the imbalance of graph size distribution, we utilize a similarity-friendly graph random walk to extract a core subgraph to improve the accuracy of graph kernel similarity calculation, and then construct a more reasonable kernel-based graph of graphs, thereby alleviating the class imbalance and size imbalance. Extensive experiments on multiple benchmark datasets demonstrate that our proposed ImbGNN framework outperforms previous baselines on imbalanced graph classification tasks. The code of ImbGNN is available in~https://github.com/Xiaovy/ImbGNN.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Sumyeong Ahn, Jongwoo Ko, and Se-Young Yun. 2023. CUDA: Curriculum of Data Augmentation for Long-tailed Recognition. In The Eleventh International Conference on Learning Representations.
[2]
Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. 2005. Protein function prediction via graph kernels. Bioinformatics 21, suppl_1 (2005), i47--i56.
[3]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[4]
Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2018. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018).
[5]
Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 702--703.
[6]
Limeng Cui, Xianfeng Tang, Sumeet Katariya, Nikhil Rao, Pallav Agrawal, Karthik Subbian, and Dongwon Lee. 2022. ALLIE: Active learning on large-scale imbalanced graphs. In Proceedings of the ACM Web Conference 2022. 690--698.
[7]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Classbalanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9268--9277.
[8]
Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data augmentation for deep graph learning: A survey. ACM SIGKDD Explorations Newsletter 24, 2 (2022), 61--77.
[9]
Paul D Dobson and Andrew J Doig. 2003. Distinguishing enzyme structures from non-enzymes without alignments. Journal of molecular biology 330, 4 (2003), 771--783.
[10]
Shawn Gu, Meng Jiang, Pietro Hiram Guzzi, and Tijana Milenkovi?. 2022. Modeling multi-scale data via a network of networks. Bioinformatics 38, 9 (2022), 2544--2553.
[11]
Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. 2022. G-mixup: Graph data augmentation for graph classification. In International Conference on Machine Learning. PMLR, 8230--8248.
[12]
Qihe Huang, Lei Shen, Ruixin Zhang, Shouhong Ding, Binwu Wang, Zhengyang Zhou, and Yang Wang. 2023. CrossGNN: Confronting Noisy Multivariate Time Series Via Cross Interaction Refinement. In Thirty-seventh Conference on Neural Information Processing Systems.
[13]
Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, and Yannis Kalantidis. 2020. Decoupling representation and classifier for long-tailed recognition. In Eighth International Conference on Learning Representations (ICLR).
[14]
Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, and Sungwoong Kim. 2019. Fast autoaugment. Advances in Neural Information Processing Systems 32 (2019).
[15]
Jie Liu, Mengting He, Guangtao Wang, Nguyen Quoc Viet Hung, Xuequn Shang, and Hongzhi Yin. 2023. Imbalanced Node Classification Beyond Homophilic Assumption. arXiv preprint arXiv:2304.14635 (2023).
[16]
Zemin Liu, Yuan Li, Nan Chen, QianWang, Bryan Hooi, and Bingsheng He. 2023. A Survey of Imbalanced Learning on Graphs: Problems, Techniques, and Future Directions. arXiv preprint arXiv:2308.13821 (2023).
[17]
Zemin Liu, Qiheng Mao, Chenghao Liu, Yuan Fang, and Jianling Sun. 2022. On size-oriented long-tailed graph classification of graph neural networks. In Proceedings of the ACM Web Conference 2022. 1506--1516.
[18]
Zemin Liu, Trung-Kien Nguyen, and Yuan Fang. 2021. Tail-gnn: Tail-node graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1109--1119.
[19]
Jingchao Ni, Hanghang Tong, Wei Fan, and Xiang Zhang. 2014. Inside the atoms: ranking on a network of networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1356--1365.
[20]
Joonhyung Park, Jaeyun Song, and Eunho Yang. 2021. Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification. In International Conference on Learning Representations.
[21]
Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. 2020. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems 33 (2020), 4175--4186.
[22]
Min Shi, Yufei Tang, Xingquan Zhu, David Wilson, and Jianxun Liu. 2020. Multiclass imbalanced graph convolutional network learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20).
[23]
Jaeyun Song, Joonhyung Park, and Eunho Yang. 2022. TAM: topology-aware margin loss for class-imbalanced node classification. In International Conference on Machine Learning. PMLR, 20369--20383.
[24]
Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. 2019. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000 (2019).
[25]
Hannu Toivonen, Ashwin Srinivasan, Ross D King, Stefan Kramer, and Christoph Helma. 2003. Statistical evaluation of the predictive toxicology challenge 2000-- 2001. Bioinformatics 19, 10 (2003), 1183--1193.
[26]
Nikil Wale, Ian A Watson, and George Karypis. 2008. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems 14 (2008), 347--375.
[27]
BinwuWang, PengkunWang,Wei Xu, XuWang, Yudong Zhang, KunWang, and YangWang. 2024. Kill Two Birds with One Stone: Rethinking Data Augmentation for Deep Long-tailed Learning. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=RzY9qQHUXy
[28]
Hanchen Wang, Defu Lian, Ying Zhang, Lu Qin, and Xuemin Lin. 2021. GoGNN: graph of graphs neural network for predicting structured entity interactions. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 1317--1323.
[29]
Pengkun Wang, Chuancai Ge, Zhengyang Zhou, Xu Wang, Yuantao Li, and Yang Wang. 2021. Joint gated co-attention based multi-modal networks for subregion house price prediction. IEEE Transactions on Knowledge and Data Engineering (2021).
[30]
Pengkun Wang, Xu Wang, Binwu Wang, Yudong Zhang, Lei Bai, and Yang Wang. 2023. Long-Tailed Time Series Classification via Feature Space Rebalancing. In International Conference on Database Systems for Advanced Applications. Springer, 151--166.
[31]
Pengkun Wang, Chaochao Zhu, Xu Wang, Zhengyang Zhou, Guang Wang, and YangWang. 2022. Inferring intersection traffic patterns with sparse video surveillance information: An st-gan method. IEEE Transactions on Vehicular Technology 71, 9 (2022), 9840--9852.
[32]
RuiWang, Yongkun Li, Shuai Lin,WeiJieWu, Hong Xie, Yinlong Xu, and John CS Lui. 2022. Common neighbors matter: fast random walk sampling with common neighbor awareness. IEEE Transactions on Knowledge and Data Engineering 35, 5 (2022), 4570--4584.
[33]
Yu Wang, Charu Aggarwal, and Tyler Derr. 2021. Distance-wise prototypical graph neural network in node imbalance classification. arXiv preprint arXiv:2110.12035 (2021).
[34]
Yu Wang, Yuying Zhao, Neil Shah, and Tyler Derr. 2022. Imbalanced graph classification via graph-of-graph neural networks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2067--2076.
[35]
Lirong Wu, Jun Xia, Zhangyang Gao, Haitao Lin, Cheng Tan, and Stan Z Li. 2022. Graphmixup: Improving class-imbalanced node classification by reinforcement mixup and self-supervised context prediction. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 519--535.
[36]
Keyulu Xu,Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
[37]
Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1365--1374.
[38]
Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. 2019. Feature transfer learning for face recognition with under-represented data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5704--5713.
[39]
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in neural information processing systems 33 (2020), 5812--5823.
[40]
Sukwon Yun, Kibum Kim, Kanghoon Yoon, and Chanyoung Park. 2022. Lte4g: long-tail experts for graph neural networks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2434--2443.
[41]
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2019. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations.
[42]
Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2023. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[43]
Yongshun Zhang, Xiu-Shen Wei, Boyan Zhou, and Jianxin Wu. 2021. Bag of tricks for long-tailed visual recognition with deep convolutional neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 3447--3455.
[44]
Tianxiang Zhao, Xiang Zhang, and SuhangWang. 2021. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining. 833--841.
[45]
Zhe Zhao13, Pengkun Wang12, Haibin Wen, Yudong Zhang, Zhengyang Zhou12, and Yang Wang. 2024. A Twist for Graph Classification: Optimizing Causal Information Flow in Graph Neural Networks. (2024).
[46]
George Kingsley Zipf. 1999. The psycho-biology of language: An introduction to dynamic philology. Psychology Press.

Index Terms

  1. When Imbalance Meets Imbalance: Structure-driven Learning for Imbalanced Graph Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. augmentation
    2. class imbalance
    3. graph classification
    4. graph of graphs
    5. structural imbalance

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 309
      Total Downloads
    • Downloads (Last 12 months)309
    • Downloads (Last 6 weeks)56
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media