research-article

PANE: scalable and effective attributed network embedding

Authors:

Sourav S. Bhowmick,

Juncheng LiuAuthors Info & Claims

The VLDB Journal, Volume 32, Issue 6

Pages 1237 - 1262

https://doi.org/10.1007/s00778-023-00790-4

Published: 24 March 2023 Publication History

Abstract

Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node

v \in G

to a compact vector

X_{v}

, which can be used in downstream machine learning tasks. Ideally,

X_{v}

should capture node v’s affinity to each attribute, which considers not only v’s own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both. This paper proposes

PANE

, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification.

PANE

obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second,

PANE

includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally,

PANE

utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. The performance of

PANE

depends upon the number of attributes in the input network. To handle large networks with numerous attributes, we further extend

PANE

to

{PANE}^{+ +}

, which employs an effective attribute clustering technique. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that

PANE

and

{PANE}^{+ +}

consistently outperform all existing methods in terms of result quality, while being orders of magnitude faster.

References

[1]

Arora, S., Ge, R., Kannan, R., Moitra, A.: Computing a nonnegative matrix factorization-provably. STOC, pp. 145–161 (2012)

[2]

Bandyopadhyay, S., Vivek, S.V., Murty, M.: Outlier resistant unsupervised deep architectures for attributed network embedding. WSDM, pp. 25–33 (2020).

[3]

Bielak, P., Tagowski, K., Falkiewicz, M., Kajdanowicz, T., Chawla, N.V.: FILDNE: A framework for incremental learning of dynamic networks embeddings. Knowl. Based Syst 236, 107–453 (2022).

[4]

Bojchevski, A., Klicpera, J., Perozzi, B., Kapoor, A., Blais, M., Rózemberczki, B., Lukasik, M., Günnemann, S.: Scaling graph neural networks with approximate pagerank. In: KDD, pp. 2464–2473 (2020).

[5]

Bottou, L.: Large-scale machine learning with stochastic gradient descent. COMPSTAT pp. 177–186 (2010).

[6]

Cen, Y., Zou, X., Zhang, J., Yang, H., Zhou, J., Tang, J.: Representation learning for attributed multiplex heterogeneous network. KDD pp. 1358–1368 (2019).

[7]

Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., Huang, T.S.: Heterogeneous network embedding via deep architectures. KDD pp. 119–128 (2015).

[8]

Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist., pp. 22–29 (1990)

[9]

Comon, P., Luciani, X., De Almeida, A.L.: Tensor decompositions, alternating least squares and other tales. J. Chemom., pp. 393–405 (2009)

[10]

Cortes C and Vapnik V Support-vector networks Mach. Learn. 1995 20 3 273-297

[11]

Davison, M.L.: Introduction to Multidimensional Scaling (1983)

[12]

Dempster A, Laird N, and Rubin D Maximum likelihood from incomplete data via the em algorithm J. R. Stat. Soc. 1977 39 1 1-38

[13]

Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: KDD, pp. 135–144. ACM (2017).

[14]

Dong, Y., Hu, Z., Wang, K., Sun, Y., Tang, J.: Heterogeneous network representation learning. In: C. Bessiere (ed.) IJCAI, pp. 4861–4867. ijcai.org (2020).

[15]

Du, L., Wang, Y., Song, G., Lu, Z., Wang, J.: Dynamic network embedding : An extended approach for skip-gram based network embedding. In: J. Lang (ed.) IJCAI, pp. 2086–2092. ijcai.org (2018).

[16]

Duan, Z., Sun, X., Zhao, S., Chen, J., Zhang, Y., Tang, J.: Hierarchical community structure preserving approach for network embedding. Inf. Sci. 546, 1084–1096 (2021).

[17]

Fu, T., Lee, W., Lei, Z.: Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In: E. Lim, M. Winslett, M. Sanderson, A.W. Fu, J. Sun, J.S. Culpepper, E. Lo, J.C. Ho, D. Donato, R. Agrawal, Y. Zheng, C. Castillo, A. Sun, V.S. Tseng, C. Li (eds.) CIKM, pp. 1797–1806. ACM (2017).

[18]

Gao, H., Huang, H.: Deep attributed network embedding. IJCAI pp. 3364–3370 (2018).

[19]

Gao, H., Pei, J., Huang, H.: Progan: Network embedding via proximity generative adversarial network. KDD pp. 1308–1316 (2019).

[20]

Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Linear Algebra, pp. 134–151 (1971)

[21]

Golub, G.H., Van Loan, C.F.: Matrix Computations, 1996. Johns Hopkins University, Press, Baltimore, MD, USA (1996)

[22]

Gong, M., Chen, C., Xie, Y., Wang, S.: Community preserving network embedding based on memetic algorithm. TETCI 4(2), 108–118 (2020).

[23]

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)

[24]

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. NeurIPS pp. 2672–2680 (2014)

[25]

Goyal, P., Kamra, N., He, X., Liu, Y.: Dyngem: Deep embedding method for dynamic graphs. CoRR abs/1805.11273 (2018). http://arxiv.org/abs/1805.11273

[26]

Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

[27]

Guo, X., Zhou, B., Skiena, S.: Subset node representation learning over large dynamic graphs. In: F. Zhu, B.C. Ooi, C. Miao (eds.) KDD, pp. 516–526. ACM (2021).

[28]

Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst 11(9), 1074–1085 (1992).

[29]

Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. NeurIPS, pp. 1025–1035 (2017)

[30]

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput., pp. 1735–1780 (1997)

[31]

Holland PW, Laskey KB, and Leinhardt S Stochastic blockmodels: first steps Soc. Netw 1983 5 2 109-137

[32]

Hou, Y., Chen, H., Li, C., Cheng, J., Yang, M.C.: A representation learning framework for property graphs. KDD pp. 65–73 (2019).

[33]

Huang, W., Li, Y., Fang, Y., Fan, J., Yang, H.: Biane: Bipartite attributed network embedding. In: SIGIR, pp. 149–158 (2020).

[34]

Huang, X., Li, J., Hu, X.: Accelerated attributed network embedding. SDM, pp. 633–641 (2017).

[35]

Hussein, R., Yang, D., Cudré-Mauroux, P.: Are meta-paths necessary?: Revisiting heterogeneous graph embeddings. In: A. Cuzzocrea, J. Allan, N.W. Paton, D. Srivastava, R. Agrawal, A.Z. Broder, M.J. Zaki, K.S. Candan, A. Labrinidis, A. Schuster, H. Wang (eds.) CIKM, pp. 437–446. ACM (2018).

[36]

Jeh, G., Widom, J.: Scaling personalized web search. TheWebConf, pp. 271–279 (2003).

[37]

Jin, D., Li, B., Jiao, P., He, D., Zhang, W.: Network-specific variational auto-encoder for embedding in attribute networks. IJCAI, pp. 2663–2669 (2019).

[38]

Kaggle: Kdd cup (2012). https://www.kaggle.com/c/kddcup2012-track1

[39]

Kanatsoulis, C.I., Sidiropoulos, N.D.: Gage: Geometry preserving attributed graph embeddings. In: WSDM, pp. 439–448 (2022).

[40]

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ICLR (2016)

[41]

Lerer, A., Wu, L., Shen, J., Lacroix, T., Wehrstedt, L., Bose, A., Peysakhovich, A.: PyTorch-BigGraph: a large-scale graph embedding system. SysML, pp. 120–131 (2019)

[42]

Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. NeurIPS, pp. 539–547 (2012)

[43]

Li J, Huang L, Wang C, Huang D, Lai J, and Chen P Attributed network embedding with micro-meso structure TKDD 2021 15 4 72:1-72:26

[44]

Li, Z., Zheng, W., Lin, X., Zhao, Z., Wang, Z., Wang, Y., Jian, X., Chen, L., Yan, Q., Mao, T.: Transn: Heterogeneous network representation learning by translating node embeddings. In: ICDE, pp. 589–600. IEEE (2020).

[45]

Liang, X., Li, D., Madden, A.: Attributed network embedding based on mutual information estimation. In: M. d’Aquin, S. Dietze, C. Hauff, E. Curry, P. Cudré-Mauroux (eds.) CIKM, pp. 835–844. ACM (2020).

[46]

Liao L, He X, Zhang H, and Chua T.S Attributed social network embedding TKDE 2018 30 12 2257-2270

[47]

Liu, J., He, Z., Wei, L., Huang, Y.: Content to node: Self-translation network embedding. KDD, pp. 1794–1802 (2018).

[48]

Liu X, Yang B, Song W, Musial K, Zuo W, Chen H, and Yin H A block-based generative model for attributed network embedding World Wide Web 2021 24 5 1439-1464

[49]

Liu, Z., Huang, C., Yu, Y., Dong, J.: Motif-preserving dynamic attributed network embedding. In: TheWebConf, pp. 1629–1638 (2021)

[50]

Lutkepohl H Handbook of matrices Comput. Stat. Data Anal. 1997 2 25 243

[51]

Ma, J., Cui, P., Wang, X., Zhu, W.: Hierarchical taxonomy aware network embedding. KDD, pp. 1920–1929 (2018).

[52]

Mahdavi, S., Khoshraftar, S., An, A.: dynnode2vec: Scalable dynamic network embedding. In: N. Abe, H. Liu, C. Pu, X. Hu, N.K. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, J.S. Saltz (eds.) IEEE BigData, pp. 3762–3765. IEEE (2018).

[53]

Meng, Z., Liang, S., Bao, H., Zhang, X.: Co-embedding attributed networks. WSDM, pp. 393–401 (2019).

[54]

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

[55]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. NeurIPS, pp. 3111–3119 (2013)

[56]

Musco, C., Musco, C.: Randomized block krylov methods for stronger and faster approximate singular value decomposition. NeurIPS, pp. 1396–1404 (2015)

[57]

Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. IJCAI, pp. 2609–2615 (2018).

[58]

Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. KDD, pp. 701–710 (2014).

[59]

Qiu, J., Dhulipala, L., Tang, J., Peng, R., Wang, C.: Lightne: a lightweight graph processing system for network embedding. In: SIGMOD, pp. 2281–2289 (2021).

[60]

Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., Tang, J.: Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. WSDM, pp. 459–467 (2018).

[61]

Rozemberczki B, Allen C, and Sarkar R Multi-scale attributed node embedding J. Complex Netw. 2021 9 1 1-22

[62]

Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)

[63]

Sameh AH and Wisniewski JA A trace minimization algorithm for the generalized eigenvalue problem J. Numer. Anal. 1982 19 6 1243-1259

[64]

Sheikh, N., Kefato, Z.T., Montresor, A.: A simple approach to attributed graph embedding via enhanced autoencoder. Complex Netw., pp. 797–809 (2019).

[65]

Shi, Y., Zhu, Q., Guo, F., Zhang, C., Han, J.: Easing embedding learning by comprehensive transcription of heterogeneous information networks. In: Y. Guo, F. Farooq (eds.) KDD, pp. 2190–2199. ACM (2018).

[66]

Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.J., Wang, K.: An overview of microsoft academic service (mas) and applications. TheWebConf, pp. 243–246 (2015).

[67]

Strang, G., Strang, G., Strang, G., Strang, G.: Introduction to Linear Algebra, vol. 3. Wellesley-Cambridge Press, Cambridge (1993)

[68]

Tang, J., Qu, M., Mei, Q.: PTE: predictive text embedding through large-scale heterogeneous text networks. In: L. Cao, C. Zhang, T. Joachims, G.I. Webb, D.D. Margineantu, G. Williams (eds.) KDD, pp. 1165–1174. ACM (2015).

[69]

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Large-scale information network embedding. TheWebConf, pp. 1067–1077 (2015).

[70]

Tong, H., Faloutsos, C., Pan, J.Y.: Fast random walk with restart and its applications. ICDM, pp. 613–622 (2006).

[71]

Tsitsulin, A., Mottin, D., Karras, P., Müller, E.: Verse: Versatile graph embeddings from similarity measures. TheWebConf, pp. 539–548 (2018).

[72]

Tsitsulin A, Munkhoeva M, Mottin D, Karras P, Oseledets I, and Müller E Frede: anytime graph embeddings PVLDB 2021 14 6 1102-1110

[73]

Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax. ICLR (2019)

[74]

Von Luxburg U A tutorial on spectral clustering Stat. Comput. 2007 17 4 395-416

[75]

Wang, H., Chen, E., Liu, Q., Xu, T., Du, D., Su, W., Zhang, X.: A united approach to learning sparse attributed network embedding. ICDM, pp. 557–566 (2018).

[76]

Wang, J., Qu, X., Bai, J., Li, Z., Zhang, J., Gao, J.: Sages: Scalable attributed graph embedding with sampling for unsupervised learning. TKDE, (01), 1–1 (2022).

[77]

Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: S. Singh, S. Markovitch (eds.) AAAI, pp. 203–209. AAAI Press (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14589

[78]

Wang, Y., Duan, Z., Liao, B., Wu, F., Zhuang, Y.: Heterogeneous attributed network embedding with graph convolutional networks. In: AAAI, pp. 10,061–10,062 (2019)

[79]

Wright, S.J.: Coordinate descent algorithms. Math. Program., pp. 3–34 (2015)

[80]

Wu, J., He, J.: Scalable manifold-regularized attributed network embedding via maximum mean discrepancy. CIKM, pp. 2101–2104 (2019).

[81]

Wu, W., Li, B., Chen, L., Zhang, C.: Efficient attributed network embedding via recursive randomized hashing. IJCAI, pp. 2861–2867 (2018).

[82]

Xie Y, Yu B, Lv S, Zhang C, Wang G, and Gong M A survey on heterogeneous network representation learning Pattern Recognit. 2021 116 107-936

[83]

Xue G, Zhong M, Li J, Chen J, and Zhai C Dynamic network embedding survey Neurocomputing 2022 472 212-223

[84]

Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.: Network representation learning with rich text information. IJCAI, pp. 2111–2117 (2015)

[85]

Yang, C., Xiao, Y., Zhang, Y., Sun, Y., Han, J.: Heterogeneous network representation learning: a unified framework with survey and benchmark. TKDE (2020)

[86]

Yang C, Xiao Y, Zhang Y, Sun Y, and Han J Heterogeneous network representation learning: a unified framework with survey and benchmark TKDE 2022 34 10 4854-4873

[87]

Yang, C., Zhong, L., Li, L.J., Jie, L.: Bi-directional joint inference for user links and attributes on large social graphs. TheWebConf, pp. 564–573 (2017).

[88]

Yang, H., Pan, S., Chen, L., Zhou, C., Zhang, P.: Low-bit quantization for attributed network representation learning. IJCAI, pp. 4047–4053 (2019).

[89]

Yang, H., Pan, S., Zhang, P., Chen, L., Lian, D., Zhang, C.: Binarized attributed network embedding. ICDM, pp. 1476–1481 (2018).

[90]

Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. ICDM, pp. 1151–1156 (2013).

[91]

Yang R, Shi J, Xiao X, Yang Y, and Bhowmick S.S Homogeneous network embedding for massive graphs via reweighted personalized pagerank PVLDB 2020 13 5 670-683

[92]

Yang R, Shi J, Xiao X, Yang Y, Bhowmick SS, and Liu J No pane, no gain: Scaling attributed network embedding in a single server ACM SIGMOD Record 2022 51 1 42-49

[93]

Yang R, Shi J, Xiao X, Yang Y, Liu J, and Bhowmick S.S Scaling attributed network embedding to massive graphs Proc. VLDB Endow. 2020 14 1 37-49

[94]

Yang, R., Shi, J., Yang, Y., Huang, K., Zhang, S., Xiao, X.: Effective and scalable clustering on massive attributed graphs. In: TheWebConf, pp. 3675–3687 (2021).

[95]

Ye D, Jiang H, Jiang Y, Wang Q, and Hu Y Community preserving mapping for network hyperbolic embedding Knowl. Based Syst. 2022 246 108-699

[96]

Yin, Y., Wei, Z.: Scalable graph embeddings via sparse transpose proximities. KDD, pp. 1429–1437 (2019).

[97]

Zhang, C., Swami, A., Chawla, N.V.: Shne: Representation learning for semantic-associated heterogeneous networks. In: WSDM, pp. 690–698 (2019).

[98]

Zhang, D., Yin, J., Zhu, X., Zhang, C.: Homophily, structure, and content augmented network representation learning. ICDM, pp. 609–618 (2016).

[99]

Zhang, Z., Cui, P., Li, H., Wang, X., Zhu, W.: Billion-scale network embedding with iterative random projection. ICDM, pp. 787–796 (2018).

[100]

Zhang, Z., Cui, P., Wang, X., Pei, J., Yao, X., Zhu, W.: Arbitrary-order proximity preserved network embedding. KDD, pp. 2778–2786 (2018).

[101]

Zhang, Z., Yang, H., Bu, J., Zhou, S., Yu, P., Zhang, J., Ester, M., Wang, C.: Anrl: Attributed network representation learning via deep neural networks. IJCAI, pp. 3155–3161 (2018).

[102]

Zheng S, Guan D, and Yuan W Semantic-aware heterogeneous information network embedding with incompatible meta-paths WWW 2022 25 1 1-21

[103]

Zhou, C., Liu, Y., Liu, X., Liu, Z., Gao, J.: Scalable graph embedding for asymmetric proximity. AAAI, pp. 2942–2948 (2017)

[104]

Zhou, S., Yang, H., Wang, X., Bu, J., Ester, M., Yu, P., Zhang, J., Wang, C.: Prre: Personalized relation ranking embedding for attributed networks. CIKM, pp. 823–832 (2018).

[105]

Zhu R, Zhao K, Yang H, Lin W, Zhou C, Ai B, Li Y, and Zhou J Aligraph: a comprehensive graph neural network platform PVLDB 2019 12 12 2094-2105

[106]

Zhu, Z., Xu, S., Tang, J., Qu, M.: Graphvite: A high-performance cpu-gpu hybrid system for node embedding. TheWebConf, pp. 2494–2504 (2019).

Recommendations

Effective and Scalable Clustering on Massive Attributed Graphs
WWW '21: Proceedings of the Web Conference 2021

Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same ...
Scalable Deep Metric Learning on Attributed Graphs
Computational Data and Social Networks
Abstract
We consider the problem of constructing embeddings of large attributed graphs and supporting multiple downstream learning tasks. We develop a graph embedding method, which is based on extending deep metric and unbiased contrastive learning ...
Dense community detection in multi-valued attributed networks

The proliferation of rich information available for real world entities and their relationships gives rise to a general type of graph, namely multi-valued attributed graph, where graph vertices are associated with a number of attributes and a vertex may ...

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 32, Issue 6

Nov 2023

233 pages

ISSN:1066-8888

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 24 March 2023

Accepted: 02 March 2023

Revision received: 22 February 2023

Received: 01 April 2022

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Education Singapore
National University of Singapore SUG grant
Qatar National Research Fund
Hong Kong Polytechnic University

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents