research-article

Scalable Algorithm for Probabilistic Overlapping Community Detection

Authors:

Kei WakabayashiAuthors Info & Claims

SWM '17: Proceedings of the 1st Workshop on Scholarly Web Mining

Pages 9 - 16

https://doi.org/10.1145/3057148.3057150

Published: 10 February 2017 Publication History

Abstract

In the data mining field, community detection, which decomposes a graph into multiple subgraphs, is one of the major techniques to analyze graph data. In recent years, the scalability of the community detection algorithm has been a crucial issue because of the growing size of real-world networks such as the co-author network and web graph. In this paper, we propose a scalable overlapping community detection method by using the stochastic variational Bayesian training of latent Dirichlet allocation (LDA) models, which predicts sets of neighbor nodes with a community mixture distribution. In the experiment, we show that the proposed method is much faster than previous methods and is capable of detecting communities even in a huge network that contains 60 million nodes and 1.8 billion edges. Furthermore, we compared different mini-batch sizes and the number of iterations in stochastic variational Bayesian inference to determine an empirical trade-off between efficiency and quality of overlapping community detection.

References

[1]

E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:1981--2014, 2008.

Digital Library

[2]

C. Biemann. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proc. COLING, 2006.

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[4]

Y. Cha and J. Cho. Social-network analysis using topic models. In Proc. SIGIR, 2012.

Digital Library

[5]

U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on youtube for topic discovery and exploration. In Proc. ICWSM, 2011.

[6]

P. K. Gopalan and D. M. Blei. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences of the United States of America, 110:14534--14539, 2013.

[7]

E. Gregori, L. Lenzini, and S. Mainardi. Parallel k-clique community detection on large-scale networks. IEEE Transactions on Parallel and Distributed Systems, 24:1651--1660, 2013.

Digital Library

[8]

A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proc. KDD, 2016.

Digital Library

[9]

S. Harenberg, G. Bello, L. Gjeltema, S. Ranshous, J. Harlalka, R. Seay, K. Padmanabhan, and N. Samatova. Community detection in large-scale networks: A survey and empirical evaluation. Wiley Interdisciplinary Reviews: Computational Statistics, 6:426--439, 2014.

Digital Library

[10]

K. Henderson and T. Eliassi-Rad. Applying latent dirichlet allocation to group discovery in large graphs. In Proc. SAC, 2009.

Digital Library

[11]

M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. Journal of Machine Learning Research, 14:1303--1347, 2013.

Digital Library

[12]

D. M. Mimno, M. D. Hoffman, and D. M. Blei. Sparse stochastic inference for latent dirichlet allocation. In Proc. ICML, 2012.

Digital Library

[13]

M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.

[14]

G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814--818, 2005.

[15]

S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Community detection in social media performance and application considerations. Data Mining and Knowledge Discovery, 24:515--554, 2012.

Digital Library

[16]

H. Shiokawa, Y. Fujiwara, and M. Onizuka. Scan++: Efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. VLDB Endowment, 8:1178--1189, 2015.

Digital Library

[17]

J. Tang, Z. Meng, X. Nguyen, Q. Mei, and M. Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In Proc. ICML, 2014.

[18]

E. Tomita, A. Tanaka, and H. Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363:28--42, 2006.

Digital Library

[19]

C. Wang and D. M. Blei. Truncation-free stochastic variational inference for bayesian nonparametric models. In Proc. NIPS, 2012.

[20]

J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys, 45:1--35, 2013.

Digital Library

[21]

J. Yang and J. Leskovec. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proc. WSDM, 2013.

Digital Library

[22]

L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proc. KDD, 2009.

Digital Library

[23]

H. Zhang, B. Qiu, C. L. Giles, H. C. Foley, and J. Yen. An lda-based community structure discovery approach for large-scale social networks. In Proc. ISI, 2007.

Recommendations

Parallelizing SLPA for scalable overlapping community detection
Special issue on Computational Aspects of Social Network Analysis

Communities in networks are groups of nodes whose connections to the nodes in a community are stronger than with the nodes in the rest of the network. Quite often nodes participate in multiple communities; that is, communities can overlap. In this paper,...
Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining

Network communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Community-Affiliation Graph Model for Overlapping Network Community Detection
ICDM '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Communities in networks often overlap as nodes can belong to multiple communities at once. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SWM '17: Proceedings of the 1st Workshop on Scholarly Web Mining

February 2017

65 pages

ISBN:9781450352406

DOI:10.1145/3057148

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Oak Ridge National Laboratory
OU: The Open University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

SWM '17

SWM '17: 1st Workshop on Scholarly Web Mining

February 10, 2017

Cambridge, United Kingdom

Acceptance Rates

SWM '17 Paper Acceptance Rate 8 of 17 submissions, 47%;

Overall Acceptance Rate 8 of 17 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
85
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents