skip to main content
research-article

PANE: scalable and effective attributed network embedding

Published: 24 March 2023 Publication History

Abstract

Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node vG to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v’s affinity to each attribute, which considers not only v’s own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification. PANE obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. The performance of PANE depends upon the number of attributes in the input network. To handle large networks with numerous attributes, we further extend PANE to PANE++, which employs an effective attribute clustering technique. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE and PANE++ consistently outperform all existing methods in terms of result quality, while being orders of magnitude faster.

References

[1]
Arora, S., Ge, R., Kannan, R., Moitra, A.: Computing a nonnegative matrix factorization-provably. STOC, pp. 145–161 (2012)
[2]
Bandyopadhyay, S., Vivek, S.V., Murty, M.: Outlier resistant unsupervised deep architectures for attributed network embedding. WSDM, pp. 25–33 (2020).
[3]
Bielak, P., Tagowski, K., Falkiewicz, M., Kajdanowicz, T., Chawla, N.V.: FILDNE: A framework for incremental learning of dynamic networks embeddings. Knowl. Based Syst 236, 107–453 (2022).
[4]
Bojchevski, A., Klicpera, J., Perozzi, B., Kapoor, A., Blais, M., Rózemberczki, B., Lukasik, M., Günnemann, S.: Scaling graph neural networks with approximate pagerank. In: KDD, pp. 2464–2473 (2020).
[5]
Bottou, L.: Large-scale machine learning with stochastic gradient descent. COMPSTAT pp. 177–186 (2010).
[6]
Cen, Y., Zou, X., Zhang, J., Yang, H., Zhou, J., Tang, J.: Representation learning for attributed multiplex heterogeneous network. KDD pp. 1358–1368 (2019).
[7]
Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., Huang, T.S.: Heterogeneous network embedding via deep architectures. KDD pp. 119–128 (2015).
[8]
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist., pp. 22–29 (1990)
[9]
Comon, P., Luciani, X., De Almeida, A.L.: Tensor decompositions, alternating least squares and other tales. J. Chemom., pp. 393–405 (2009)
[10]
Cortes C and Vapnik V Support-vector networks Mach. Learn. 1995 20 3 273-297
[11]
Davison, M.L.: Introduction to Multidimensional Scaling (1983)
[12]
Dempster A, Laird N, and Rubin D Maximum likelihood from incomplete data via the em algorithm J. R. Stat. Soc. 1977 39 1 1-38
[13]
Dong, Y., Chawla, N.V., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: KDD, pp. 135–144. ACM (2017).
[14]
Dong, Y., Hu, Z., Wang, K., Sun, Y., Tang, J.: Heterogeneous network representation learning. In: C. Bessiere (ed.) IJCAI, pp. 4861–4867. ijcai.org (2020).
[15]
Du, L., Wang, Y., Song, G., Lu, Z., Wang, J.: Dynamic network embedding : An extended approach for skip-gram based network embedding. In: J. Lang (ed.) IJCAI, pp. 2086–2092. ijcai.org (2018).
[16]
Duan, Z., Sun, X., Zhao, S., Chen, J., Zhang, Y., Tang, J.: Hierarchical community structure preserving approach for network embedding. Inf. Sci. 546, 1084–1096 (2021).
[17]
Fu, T., Lee, W., Lei, Z.: Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In: E. Lim, M. Winslett, M. Sanderson, A.W. Fu, J. Sun, J.S. Culpepper, E. Lo, J.C. Ho, D. Donato, R. Agrawal, Y. Zheng, C. Castillo, A. Sun, V.S. Tseng, C. Li (eds.) CIKM, pp. 1797–1806. ACM (2017).
[18]
Gao, H., Huang, H.: Deep attributed network embedding. IJCAI pp. 3364–3370 (2018).
[19]
Gao, H., Pei, J., Huang, H.: Progan: Network embedding via proximity generative adversarial network. KDD pp. 1308–1316 (2019).
[20]
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Linear Algebra, pp. 134–151 (1971)
[21]
Golub, G.H., Van Loan, C.F.: Matrix Computations, 1996. Johns Hopkins University, Press, Baltimore, MD, USA (1996)
[22]
Gong, M., Chen, C., Xie, Y., Wang, S.: Community preserving network embedding based on memetic algorithm. TETCI 4(2), 108–118 (2020).
[23]
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)
[24]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. NeurIPS pp. 2672–2680 (2014)
[25]
Goyal, P., Kamra, N., He, X., Liu, Y.: Dyngem: Deep embedding method for dynamic graphs. CoRR abs/1805.11273 (2018). http://arxiv.org/abs/1805.11273
[26]
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
[27]
Guo, X., Zhou, B., Skiena, S.: Subset node representation learning over large dynamic graphs. In: F. Zhu, B.C. Ooi, C. Miao (eds.) KDD, pp. 516–526. ACM (2021).
[28]
Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst 11(9), 1074–1085 (1992).
[29]
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. NeurIPS, pp. 1025–1035 (2017)
[30]
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput., pp. 1735–1780 (1997)
[31]
Holland PW, Laskey KB, and Leinhardt S Stochastic blockmodels: first steps Soc. Netw 1983 5 2 109-137
[32]
Hou, Y., Chen, H., Li, C., Cheng, J., Yang, M.C.: A representation learning framework for property graphs. KDD pp. 65–73 (2019).
[33]
Huang, W., Li, Y., Fang, Y., Fan, J., Yang, H.: Biane: Bipartite attributed network embedding. In: SIGIR, pp. 149–158 (2020).
[34]
Huang, X., Li, J., Hu, X.: Accelerated attributed network embedding. SDM, pp. 633–641 (2017).
[35]
Hussein, R., Yang, D., Cudré-Mauroux, P.: Are meta-paths necessary?: Revisiting heterogeneous graph embeddings. In: A. Cuzzocrea, J. Allan, N.W. Paton, D. Srivastava, R. Agrawal, A.Z. Broder, M.J. Zaki, K.S. Candan, A. Labrinidis, A. Schuster, H. Wang (eds.) CIKM, pp. 437–446. ACM (2018).
[36]
Jeh, G., Widom, J.: Scaling personalized web search. TheWebConf, pp. 271–279 (2003).
[37]
Jin, D., Li, B., Jiao, P., He, D., Zhang, W.: Network-specific variational auto-encoder for embedding in attribute networks. IJCAI, pp. 2663–2669 (2019).
[39]
Kanatsoulis, C.I., Sidiropoulos, N.D.: Gage: Geometry preserving attributed graph embeddings. In: WSDM, pp. 439–448 (2022).
[40]
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ICLR (2016)
[41]
Lerer, A., Wu, L., Shen, J., Lacroix, T., Wehrstedt, L., Bose, A., Peysakhovich, A.: PyTorch-BigGraph: a large-scale graph embedding system. SysML, pp. 120–131 (2019)
[42]
Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. NeurIPS, pp. 539–547 (2012)
[43]
Li J, Huang L, Wang C, Huang D, Lai J, and Chen P Attributed network embedding with micro-meso structure TKDD 2021 15 4 72:1-72:26
[44]
Li, Z., Zheng, W., Lin, X., Zhao, Z., Wang, Z., Wang, Y., Jian, X., Chen, L., Yan, Q., Mao, T.: Transn: Heterogeneous network representation learning by translating node embeddings. In: ICDE, pp. 589–600. IEEE (2020).
[45]
Liang, X., Li, D., Madden, A.: Attributed network embedding based on mutual information estimation. In: M. d’Aquin, S. Dietze, C. Hauff, E. Curry, P. Cudré-Mauroux (eds.) CIKM, pp. 835–844. ACM (2020).
[46]
Liao L, He X, Zhang H, and Chua T.S Attributed social network embedding TKDE 2018 30 12 2257-2270
[47]
Liu, J., He, Z., Wei, L., Huang, Y.: Content to node: Self-translation network embedding. KDD, pp. 1794–1802 (2018).
[48]
Liu X, Yang B, Song W, Musial K, Zuo W, Chen H, and Yin H A block-based generative model for attributed network embedding World Wide Web 2021 24 5 1439-1464
[49]
Liu, Z., Huang, C., Yu, Y., Dong, J.: Motif-preserving dynamic attributed network embedding. In: TheWebConf, pp. 1629–1638 (2021)
[50]
Lutkepohl H Handbook of matrices Comput. Stat. Data Anal. 1997 2 25 243
[51]
Ma, J., Cui, P., Wang, X., Zhu, W.: Hierarchical taxonomy aware network embedding. KDD, pp. 1920–1929 (2018).
[52]
Mahdavi, S., Khoshraftar, S., An, A.: dynnode2vec: Scalable dynamic network embedding. In: N. Abe, H. Liu, C. Pu, X. Hu, N.K. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, J.S. Saltz (eds.) IEEE BigData, pp. 3762–3765. IEEE (2018).
[53]
Meng, Z., Liang, S., Bao, H., Zhang, X.: Co-embedding attributed networks. WSDM, pp. 393–401 (2019).
[54]
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
[55]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. NeurIPS, pp. 3111–3119 (2013)
[56]
Musco, C., Musco, C.: Randomized block krylov methods for stronger and faster approximate singular value decomposition. NeurIPS, pp. 1396–1404 (2015)
[57]
Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. IJCAI, pp. 2609–2615 (2018).
[58]
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. KDD, pp. 701–710 (2014).
[59]
Qiu, J., Dhulipala, L., Tang, J., Peng, R., Wang, C.: Lightne: a lightweight graph processing system for network embedding. In: SIGMOD, pp. 2281–2289 (2021).
[60]
Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., Tang, J.: Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. WSDM, pp. 459–467 (2018).
[61]
Rozemberczki B, Allen C, and Sarkar R Multi-scale attributed node embedding J. Complex Netw. 2021 9 1 1-22
[62]
Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)
[63]
Sameh AH and Wisniewski JA A trace minimization algorithm for the generalized eigenvalue problem J. Numer. Anal. 1982 19 6 1243-1259
[64]
Sheikh, N., Kefato, Z.T., Montresor, A.: A simple approach to attributed graph embedding via enhanced autoencoder. Complex Netw., pp. 797–809 (2019).
[65]
Shi, Y., Zhu, Q., Guo, F., Zhang, C., Han, J.: Easing embedding learning by comprehensive transcription of heterogeneous information networks. In: Y. Guo, F. Farooq (eds.) KDD, pp. 2190–2199. ACM (2018).
[66]
Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.J., Wang, K.: An overview of microsoft academic service (mas) and applications. TheWebConf, pp. 243–246 (2015).
[67]
Strang, G., Strang, G., Strang, G., Strang, G.: Introduction to Linear Algebra, vol. 3. Wellesley-Cambridge Press, Cambridge (1993)
[68]
Tang, J., Qu, M., Mei, Q.: PTE: predictive text embedding through large-scale heterogeneous text networks. In: L. Cao, C. Zhang, T. Joachims, G.I. Webb, D.D. Margineantu, G. Williams (eds.) KDD, pp. 1165–1174. ACM (2015).
[69]
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Large-scale information network embedding. TheWebConf, pp. 1067–1077 (2015).
[70]
Tong, H., Faloutsos, C., Pan, J.Y.: Fast random walk with restart and its applications. ICDM, pp. 613–622 (2006).
[71]
Tsitsulin, A., Mottin, D., Karras, P., Müller, E.: Verse: Versatile graph embeddings from similarity measures. TheWebConf, pp. 539–548 (2018).
[72]
Tsitsulin A, Munkhoeva M, Mottin D, Karras P, Oseledets I, and Müller E Frede: anytime graph embeddings PVLDB 2021 14 6 1102-1110
[73]
Veličković, P., Fedus, W., Hamilton, W.L., Liò, P., Bengio, Y., Hjelm, R.D.: Deep graph infomax. ICLR (2019)
[74]
Von Luxburg U A tutorial on spectral clustering Stat. Comput. 2007 17 4 395-416
[75]
Wang, H., Chen, E., Liu, Q., Xu, T., Du, D., Su, W., Zhang, X.: A united approach to learning sparse attributed network embedding. ICDM, pp. 557–566 (2018).
[76]
Wang, J., Qu, X., Bai, J., Li, Z., Zhang, J., Gao, J.: Sages: Scalable attributed graph embedding with sampling for unsupervised learning. TKDE, (01), 1–1 (2022).
[77]
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: S. Singh, S. Markovitch (eds.) AAAI, pp. 203–209. AAAI Press (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14589
[78]
Wang, Y., Duan, Z., Liao, B., Wu, F., Zhuang, Y.: Heterogeneous attributed network embedding with graph convolutional networks. In: AAAI, pp. 10,061–10,062 (2019)
[79]
Wright, S.J.: Coordinate descent algorithms. Math. Program., pp. 3–34 (2015)
[80]
Wu, J., He, J.: Scalable manifold-regularized attributed network embedding via maximum mean discrepancy. CIKM, pp. 2101–2104 (2019).
[81]
Wu, W., Li, B., Chen, L., Zhang, C.: Efficient attributed network embedding via recursive randomized hashing. IJCAI, pp. 2861–2867 (2018).
[82]
Xie Y, Yu B, Lv S, Zhang C, Wang G, and Gong M A survey on heterogeneous network representation learning Pattern Recognit. 2021 116 107-936
[83]
Xue G, Zhong M, Li J, Chen J, and Zhai C Dynamic network embedding survey Neurocomputing 2022 472 212-223
[84]
Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.: Network representation learning with rich text information. IJCAI, pp. 2111–2117 (2015)
[85]
Yang, C., Xiao, Y., Zhang, Y., Sun, Y., Han, J.: Heterogeneous network representation learning: a unified framework with survey and benchmark. TKDE (2020)
[86]
Yang C, Xiao Y, Zhang Y, Sun Y, and Han J Heterogeneous network representation learning: a unified framework with survey and benchmark TKDE 2022 34 10 4854-4873
[87]
Yang, C., Zhong, L., Li, L.J., Jie, L.: Bi-directional joint inference for user links and attributes on large social graphs. TheWebConf, pp. 564–573 (2017).
[88]
Yang, H., Pan, S., Chen, L., Zhou, C., Zhang, P.: Low-bit quantization for attributed network representation learning. IJCAI, pp. 4047–4053 (2019).
[89]
Yang, H., Pan, S., Zhang, P., Chen, L., Lian, D., Zhang, C.: Binarized attributed network embedding. ICDM, pp. 1476–1481 (2018).
[90]
Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. ICDM, pp. 1151–1156 (2013).
[91]
Yang R, Shi J, Xiao X, Yang Y, and Bhowmick S.S Homogeneous network embedding for massive graphs via reweighted personalized pagerank PVLDB 2020 13 5 670-683
[92]
Yang R, Shi J, Xiao X, Yang Y, Bhowmick SS, and Liu J No pane, no gain: Scaling attributed network embedding in a single server ACM SIGMOD Record 2022 51 1 42-49
[93]
Yang R, Shi J, Xiao X, Yang Y, Liu J, and Bhowmick S.S Scaling attributed network embedding to massive graphs Proc. VLDB Endow. 2020 14 1 37-49
[94]
Yang, R., Shi, J., Yang, Y., Huang, K., Zhang, S., Xiao, X.: Effective and scalable clustering on massive attributed graphs. In: TheWebConf, pp. 3675–3687 (2021).
[95]
Ye D, Jiang H, Jiang Y, Wang Q, and Hu Y Community preserving mapping for network hyperbolic embedding Knowl. Based Syst. 2022 246 108-699
[96]
Yin, Y., Wei, Z.: Scalable graph embeddings via sparse transpose proximities. KDD, pp. 1429–1437 (2019).
[97]
Zhang, C., Swami, A., Chawla, N.V.: Shne: Representation learning for semantic-associated heterogeneous networks. In: WSDM, pp. 690–698 (2019).
[98]
Zhang, D., Yin, J., Zhu, X., Zhang, C.: Homophily, structure, and content augmented network representation learning. ICDM, pp. 609–618 (2016).
[99]
Zhang, Z., Cui, P., Li, H., Wang, X., Zhu, W.: Billion-scale network embedding with iterative random projection. ICDM, pp. 787–796 (2018).
[100]
Zhang, Z., Cui, P., Wang, X., Pei, J., Yao, X., Zhu, W.: Arbitrary-order proximity preserved network embedding. KDD, pp. 2778–2786 (2018).
[101]
Zhang, Z., Yang, H., Bu, J., Zhou, S., Yu, P., Zhang, J., Ester, M., Wang, C.: Anrl: Attributed network representation learning via deep neural networks. IJCAI, pp. 3155–3161 (2018).
[102]
Zheng S, Guan D, and Yuan W Semantic-aware heterogeneous information network embedding with incompatible meta-paths WWW 2022 25 1 1-21
[103]
Zhou, C., Liu, Y., Liu, X., Liu, Z., Gao, J.: Scalable graph embedding for asymmetric proximity. AAAI, pp. 2942–2948 (2017)
[104]
Zhou, S., Yang, H., Wang, X., Bu, J., Ester, M., Yu, P., Zhang, J., Wang, C.: Prre: Personalized relation ranking embedding for attributed networks. CIKM, pp. 823–832 (2018).
[105]
Zhu R, Zhao K, Yang H, Lin W, Zhou C, Ai B, Li Y, and Zhou J Aligraph: a comprehensive graph neural network platform PVLDB 2019 12 12 2094-2105
[106]
Zhu, Z., Xu, S., Tang, J., Qu, M.: Graphvite: A high-performance cpu-gpu hybrid system for node embedding. TheWebConf, pp. 2494–2504 (2019).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 32, Issue 6
Nov 2023
233 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 24 March 2023
Accepted: 02 March 2023
Revision received: 22 February 2023
Received: 01 April 2022

Author Tags

  1. Network embedding
  2. Attributed graph
  3. Random walk
  4. Matrix factorization
  5. Scalability

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media