research-article

Efficient Training of Graph Neural Networks on Large Graphs

Authors:

Hongbo YinAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 17, Issue 12

Pages 4237 - 4240

https://doi.org/10.14778/3685800.3685844

Published: 08 November 2024 Publication History

Abstract

Graph Neural Networks (GNNs) have gained significant popularity for learning representations of graph-structured data. Mainstream GNNs employ the message passing scheme that iteratively propagates information between connected nodes through edges. However, this scheme incurs high training costs, hindering the applicability of GNNs on large graphs. Recently, the database community has extensively researched effective solutions to facilitate efficient GNN training on massive graphs. In this tutorial, we provide a comprehensive overview of the GNN training process based on the graph data lifecycle, covering graph preprocessing, batch generation, data transfer, and model training stages. We discuss recent data management efforts aiming at accelerating individual stages or improving the overall training efficiency. Recognizing the distinct training issues associated with static and dynamic graphs, we first focus on efficient GNN training on static graphs, followed by an exploration of training GNNs on dynamic graphs. Finally, we suggest some potential research directions in this area. We believe this tutorial is valuable for researchers and practitioners to understand the bottleneck of GNN training and the advanced data management techniques to accelerate the training of different GNNs on massive graphs in diverse hardware settings.

References

[1]

Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, and Ge Yu. 2023. NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments. arXiv:2311.13225 (2023).

[2]

Wendong Bi, Bingbing Xu, Xiaoqian Sun, Zidong Wang, Huawei Shen, and Xueqi Cheng. 2022. Company-as-tribe: Company financial risk assessment on tribe-style graph with hierarchical graph neural networks. In KDD. 2712--2720.

[3]

Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, and Fan Yu. 2021. DGCL: An efficient communication library for distributed GNN training. In EuroSys. 130--144.

Digital Library

[4]

Zhenkun Cai, Qihui Zhou, Xiao Yan, Da Zheng, Xiang Song, Chenguang Zheng, James Cheng, and George Karypis. 2023. DSP: Efficient GNN training with multiple GPUs. In PPoPP. 392--404.

Digital Library

[5]

Venkatesan T Chakaravarthy, Shivmaran S Pandian, Saurabh Raje, Yogish Sabharwal, Toyotaro Suzumura, and Shashanka Ubaru. 2021. Efficient scaling of dynamic graph neural networks. In SC. 1--15.

[6]

Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xiang Wang, and Xiangnan He. 2024. Exgc: Bridging efficiency and explainability in graph condensation. arXiv preprint arXiv:2402.05962 (2024).

[7]

Jingzhi Fang, Yanyan Shen, Yue Wang, and Lei Chen. 2020. Optimizing DNN computation graph using graph substitutions. VLDB 13, 12 (2020), 2734--2746.

Digital Library

[8]

Jingzhi Fang, Yanyan Shen, Yue Wang, and Lei Chen. 2024. STile: Searching Hybrid Sparse Formats for Sparse Deep Learning Operators Automatically. Proceedings of the ACM on Management of Data 2, 1 (2024), 1--26.

Digital Library

[9]

Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In OSDI. 551--568.

[10]

Chen Gao, Xiang Wang, Xiangnan He, and Yong Li. 2022. Graph neural networks for recommender system. In WSDM. 1623--1625.

[11]

Shihong Gao, Yiming Li, Yanyan Shen, Yingxia Shao, and Lei Chen. 2024. ETC: Efficient Training of Temporal Graph Neural Networks over Large-scale Dynamic Graphs. VLDB 17, 5 (2024), 1060--1072.

Digital Library

[12]

Shihong Gao, Yiming Li, Xin Zhang, Yanyan Shen, Yingxia Shao, and Lei Chen. 2024. SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement. Proceedings of the ACM on Management of Data 2, 3 (2024), 1--25.

Digital Library

[13]

Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, and David Bader. 2020. Traversing large graphs on GPUs with unified memory. VLDB 13, 7 (2020), 1119--1133.

Digital Library

[14]

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. NeurIPS 33 (2020), 22118--22133.

[15]

Lokesh Jain, Rahul Katarya, and Shelly Sachdeva. 2023. Opinion leaders for information diffusion using graph neural network in online social networks. TWEB 17, 2 (2023), 1--37.

Digital Library

[16]

Abhinav Jangda, Sandeep Polisetty, Arjun Guha, and Marco Serafini. 2021. Accelerating graph sampling for graph machine learning using GPUs. In EuroSys. 311--326.

[17]

Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2020. Improving the accuracy, scalability, and performance of graph neural networks with roc. MLSys 2 (2020), 187--198.

[18]

Zhihao Jia, Sina Lin, Rex Ying, Jiaxuan You, Jure Leskovec, and Alex Aiken. 2020. Redundancy-free computation for graph neural networks. In KDD. 997--1005.

[19]

Xunqiang Jiang, Tianrui Jia, Yuan Fang, Chuan Shi, Zhe Lin, and Hui Wang. 2021. Pre-training on large-scale heterogeneous graph. In KDD. 756--766.

[20]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[21]

Haoyang Li and Lei Chen. 2021. Cache-based gnn system for dynamic graphs. In CIKM. 937--946.

[22]

Yiming Li, Yanyan Shen, Lei Chen, and Mingxuan Yuan. 2023. Orca: Scalable Temporal Graph Neural Network Training with Theoretical Guarantees. Proc. ACM Manag. Data 1, 1 (2023), 52:1--52:27.

Digital Library

[23]

Yiming Li, Yanyan Shen, Lei Chen, and Mingxuan Yuan. 2023. Zebra: When Temporal Graph Neural Networks Meet Temporal Personalized PageRank. VLDB 16, 6 (2023), 1332--1345.

Digital Library

[24]

Zhiyuan Li, Xun Jian, Yue Wang, and Lei Chen. 2022. Cc-gnn: A community and contraction-based graph neural network. In ICDM. IEEE, 231--240.

[25]

Zhiyuan Li, Xun Jian, Yue Wang, Yingxia Shao, and Lei Chen. 2024. DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning. VLDB 17, 6 (2024), 1364--1376. https://www.vldb.org/pvldb/vol17/p1364-li.pdf

Digital Library

[26]

Zhiqi Lin, Cheng Li, Youshan Miao, Yunxin Liu, and Yinlong Xu. 2020. Pagraph: Scaling gnn training on large graphs via computation-aware caching. In SoCC. 401--415.

Digital Library

[27]

Jingshu Peng, Zhao Chen, Yingxia Shao, Yanyan Shen, Lei Chen, and Jiannong Cao. 2022. SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks. VLDB 15, 9 (2022), 1937--1950.

Digital Library

[28]

Qiange Wang, Yanfeng Zhang, Hao Wang, Chaoyi Chen, Xiaodong Zhang, and Ge Yu. 2022. NeutronStar: Distributed GNN Training with Hybrid Dependency Management. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12--17, 2022, Zachary G. Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 1301--1315.

Digital Library

[29]

Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. GNNAdvisor: An adaptive and efficient runtime system for GNN acceleration on GPUs. In OSDI. 515--531.

[30]

Zhe Xu, Yuzhong Chen, Menghai Pan, Huiyuan Chen, Mahashweta Das, Hao Yang, and Hanghang Tong. 2023. Kernel Ridge Regression-Based Graph Dataset Distillation. In KDD. 2850--2861.

[31]

Rui Xue, Haoyu Han, Tong Zhao, Neil Shah, Jiliang Tang, and Xiaorui Liu. 2023. Large-Scale Graph Neural Networks: The Past and New Frontiers. In KDD. 5835--5836.

[32]

Jianbang Yang, Dahai Tang, Xiaoniu Song, Lei Wang, Qiang Yin, Rong Chen, Wenyuan Yu, and Jingren Zhou. 2022. GNNLab: a factored system for sample-based GNN training over GPUs. In EuroSys. 417--434.

[33]

Xin Zhang, Yanyan Shen, and Lei Chen. 2022. Feature-Oriented Sampling for Fast and Scalable GNN Training. In ICDM. IEEE, 723--732.

[34]

Xin Zhang, Yanyan Shen, Yingxia Shao, and Lei Chen. 2023. DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU. Proc. ACM Manag. Data 1, 2 (2023), 166:1--166:24.

Digital Library

[35]

Yongqi Zhang, Quanming Yao, Yingxia Shao, and Lei Chen. 2019. NSCaching: simple and efficient negative sampling for knowledge graph embedding. In ICDE. IEEE, 614--625.

[36]

Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. 2022. Learning on large-scale text-attributed graphs via variational inference. arXiv preprint arXiv:2210.14709 (2022).

[37]

Xin Zheng, Miao Zhang, Chunyang Chen, Quoc Viet Hung Nguyen, Xingquan Zhu, and Shirui Pan. 2024. Structure-free graph condensation: From large-scale graphs to condensed graph-free data. NeurIPS 36 (2024).

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

ETC: Efficient Training of Temporal Graph Neural Networks over Large-Scale Dynamic Graphs

Dynamic graphs play a crucial role in various real-world applications, such as link prediction and node classification on social media and e-commerce platforms. Temporal Graph Neural Networks (T-GNNs) have emerged as a leading approach for handling ...
Large Induced Forests in Graphs

In this article, we prove three theorems. The first is that every connected graph of order n and size m has an induced forest of order at least 8n-2m-2/9 with equality if and only if such a graph is obtained from a tree by expanding every vertex to a ...
Connectivity of k-extendable graphs with large k
Discrete mathematics and theoretical computer science (DMTCS)

Let G be a simple connected graph on 2n vertices with perfect matching. For a given positive integer k (0 ≤ k ≤ n - 1), G is k-extendable if any matching of size k in G is contained in a perfect matching of G. It is proved that if G is a k-extendable ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 17, Issue 12

August 2024

837 pages

Editors:
Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 08 November 2024

Published in PVLDB Volume 17, Issue 12

Check for updates

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
46
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)37

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents