research-article

Free access

VERSE: Versatile Graph Embeddings from Similarity Measures

Authors:

Anton Tsitsulin,

Panagiotis Karras,

Emmanuel MüllerAuthors Info & Claims

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 539 - 548

https://doi.org/10.1145/3178876.3186120

Published: 23 April 2018 Publication History

All formats PDF

Abstract

Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from words to graphs, without defining a clearly comprehensible graph-related objective. Yet, as we show, the objectives used in past works implicitly utilize similarity measures among graph nodes. In this paper, we carry the similarity orientation of previous works to its logical conclusion; we propose VERtex Similarity Embeddings (VERSE), a simple, versatile, and memory-efficient method that derives graph embeddings explicitly calibrated to preserve the distributions of a selected vertex-to-vertex similarity measure. VERSE learns such embeddings by training a single-layer neural network. While its default, scalable version does so via sampling similarity information, we also develop a variant using the full information per vertex. Our experimental study on standard benchmarks and real-world datasets demonstrates that VERSE, instantiated with diverse similarity measures, outperforms state-of-the-art methods in terms of precision and recall in major data mining tasks and supersedes them in time and space efficiency, while the scalable sampling-based variant achieves equally good result as the non-scalable full variant.

References

[1]

Sami Abu-El-Haija, Bryan Perozzi, and Rami Al-Rfou. 2017. Learning Edge Representations via Low-Rank Asymmetric Projections. CIKM (2017).

Digital Library

[2]

Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J Smola. 2013. Distributed large-scale natural graph factorization WWW. ACM, 37--48.

Digital Library

[3]

Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD. 410--421.

Digital Library

[4]

David Arthur and Sergei Vassilvitskii. 2007. k-means

[5]

: The advantages of careful seeding. In SIAM. 1027--1035.

Digital Library

[6]

Mikhail Belkin and Partha Niyogi. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering NIPS. 585--591.

Digital Library

[7]

Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. TPAMI (2013), 1798--1828.

Digital Library

[8]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. JMLR (2003), 1137--1155.

Digital Library

[9]

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 10 (2008).

[10]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems (1998), 107 -- 117.

Digital Library

[11]

Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning Graph Representations with Global Structural Information CIKM. 891--900.

Digital Library

[12]

Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations AAAI. 1145--1152.

Digital Library

[13]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR Vol. 9, Aug (2008), 1871--1874.

Digital Library

[14]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT Press.

Digital Library

[15]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks KDD. 855--864.

Digital Library

[16]

Michael Gutmann and Aapo Hyv"arinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS. 297--304.

[17]

Michael U Gutmann and Aapo Hyv"arinen. 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. JMLR Vol. 13 (2012), 307--361.

Digital Library

[18]

Keith Henderson, Brian Gallagher, Lei Li, Leman Akoglu, Tina Eliassi-Rad, Hanghang Tong, and Christos Faloutsos. 2011. It's who you know: graph mining using recursive structural features KDD. 663--671.

Digital Library

[19]

Jiafeng Hu, CK Cheng, Zhipeng Huang, Yixiang Fang, and Siqiang Luo. 2017. On Embedding Uncertain Graphs. In CIKM. ACM.

Digital Library

[20]

Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed attributed network embedding. In WSDM. ACM, 731--739.

Digital Library

[21]

Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity KDD. 538--543.

Digital Library

[22]

Minhao Jiang, Ada Wai-Chee Fu, and Raymond Chi-Wing Wong. 2017. READS: a random walk approach for efficient and accurate dynamic SimRank. VLDB Vol. 10, 9 (2017), 937--948.

Digital Library

[23]

Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika Vol. 18, 1 (1953), 39--43.

[24]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. 1097--1105.

Digital Library

[25]

Matthieu Labeau and Alexandre Allauzen. 2017. An experimental analysis of Noise-Contrastive Estimation: the noise distribution matters. EACL (2017).

[26]

John A. Lee and Michel Verleysen. 2007. Nonlinear Dimensionality Reduction (bibinfoedition1st ed.). Springer Publishing Company, Incorporated.

[27]

Ryan N Lichtenwalter, Jake T Lussier, and Nitesh V Chawla. 2010. New perspectives and methods in link prediction. In KDD. 243--252.

Digital Library

[28]

Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications Vol. 390, 6 (2011), 1150--1170.

[29]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality NIPS. 3111--3119.

Digital Library

[30]

Andriy Mnih and Yee Whye Teh. 2012. A fast and simple algorithm for training neural probabilistic language models ICML. 1751--1758.

Digital Library

[31]

Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu, and Santhoshkumar Saminathan. 2016. subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928 (2016).

[32]

Mark EJ Newman. 2006. Modularity and community structure in networks. PNAS (2006), 8577--8582.

[33]

Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric transitivity preserving graph embedding KDD. 1105--1114.

Digital Library

[34]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).

[35]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In EMNLP. 1532--1543.

[36]

Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller. 2014 a. Focused clustering and outlier detection in large attributed graphs KDD. 1346--1355.

Digital Library

[37]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014 b. DeepWalk: online learning of social representations KDD. 701--710.

Digital Library

[38]

Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity KDD. ACM, 385--394.

Digital Library

[39]

Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science Vol. 290, 5500 (2000), 2323--2326.

[40]

Jiliang Tang and Huan Liu. 2012. Unsupervised Feature Selection for Linked Social Media Data KDD. 904--912.

Digital Library

[41]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In WWW. 1067--1077.

Digital Library

[42]

Lei Tang and Huan Liu. 2009 a. Relational learning via latent social dimensions. In KDD. 817--826.

Digital Library

[43]

Lei Tang and Huan Liu. 2009 b. Scalable learning of collective behavior based on sparse social dimensions CIKM. 1107--1116.

Digital Library

[44]

Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science Vol. 290, 5500 (2000), 2319--2323.

[45]

Grigorios Tsoumakas and Ioannis Katakis. 2006. Multi-label classification: An overview. IJDWM Vol. 3, 3 (2006).

[46]

Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun. 2016. Max-Margin DeepWalk: Discriminative Learning of Network Representation IJCAI. 3889--3895.

Digital Library

[47]

L.J.P. van der Maaten and G.E. Hinton. 2008. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research Vol. 9 (2008), 2579--2605.

[48]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural Deep Network Embedding. In KDD. 1225--1234.

Digital Library

[49]

Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu. 2017. Attributed Signed Network Embedding. CIKM (2017).

Digital Library

[50]

Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of `small-world' networks. Nature Vol. 393 (1998), 440--442.

[51]

Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, and Stephen Lin. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. TPAMI Vol. 29, 1 (2007).

Digital Library

[52]

Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems Vol. 42 (2015), 181--213.

Digital Library

[53]

Wayne W Zachary. 1977. An information flow model for conflict and fission in small groups. Journal of anthropological research Vol. 33, 4 (1977), 452--473.

[54]

R. Zafarani and H. Liu. 2009. Social Computing Data Repository at ASU. (2009). deftempurl%http://socialcomputing.asu.edu tempurl

[55]

Daokun Zhanga, Jie Yinb, Xingquan Zhuc, and Chengqi Zhanga. 2017. User profile preserving social network embedding. In IJCAI.

Digital Library

Cited By

Fang PLi ZKhan ALuo SWang FShi ZFeng D(2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3424333
Chanpuriya SMusco CSotiropoulos KTsourakakis CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)On the role of edge dependency in graph generative modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692314(6325-6345)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692314
Liao ZLiu THe YLin L(2024)Effective Temporal Graph Learning via Personalized PageRankEntropy10.3390/e2607058826:7(588)Online publication date: 10-Jul-2024
https://doi.org/10.3390/e26070588
Show More Cited By

Index Terms

VERSE: Versatile Graph Embeddings from Similarity Measures
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Dimensionality reduction and manifold learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

node2vec: Scalable Feature Learning for Networks
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by ...
ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank

In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture ...
PartKG2Vec: Embedding of Partitioned Knowledge Graphs
Knowledge Science, Engineering and Management
Abstract
Large-scale knowledge graphs with billions of nodes and edges are increasingly common in many domains. Such graphs often exceed the capacity of the systems storing the graphs in a centralized data store, not to mention the limits of today’s graph ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '18: Proceedings of the 2018 World Wide Web Conference

April 2018

2000 pages

ISBN:9781450356398

General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

184
Total Citations
View Citations
2,820
Total Downloads

Downloads (Last 12 months)443
Downloads (Last 6 weeks)53

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fang PLi ZKhan ALuo SWang FShi ZFeng D(2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3424333
Chanpuriya SMusco CSotiropoulos KTsourakakis CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)On the role of edge dependency in graph generative modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692314(6325-6345)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692314
Liao ZLiu THe YLin L(2024)Effective Temporal Graph Learning via Personalized PageRankEntropy10.3390/e2607058826:7(588)Online publication date: 10-Jul-2024
https://doi.org/10.3390/e26070588
Yan LXu Y(2024)XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular DataApplied Sciences10.3390/app1413582614:13(5826)Online publication date: 3-Jul-2024
https://doi.org/10.3390/app14135826
Lai YLin XYang RWang HBaeza-Yates RBonchi F(2024)Efficient Topology-aware Data Augmentation for High-Degree Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671765(1463-1473)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671765
Yang RWu YLin XWang QChan TShi JBaeza-Yates RBonchi F(2024)Effective Clustering on Large Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671764(3782-3793)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671764
Tiady SJain ASanny DGupta KVirinchi SGupta SSaladi AGupta DSerra ESpezzano F(2024)MERLIN: Multimodal & Multilingual Embedding for Recommendations at Large-scale via Item AssociationsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680106(4914-4921)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680106
Zhang MYu WSerra ESpezzano F(2024)P-Rank+: A Scalable Efficient P-Rank Search AlgorithmProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679976(4278-4282)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679976
Lin LYu YWang ZWang ZZhao YZhao JJia TSerra ESpezzano F(2024)PSNE: Efficient Spectral Sparsification Algorithms for Scaling Network EmbeddingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679540(1420-1429)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679540
Zhang XWeng ZWang SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological PerspectiveProceedings of the ACM Web Conference 202410.1145/3589334.3645663(969-979)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645663
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten