skip to main content
10.1145/3057148.3057150acmotherconferencesArticle/Chapter ViewAbstractPublication PagesswmConference Proceedingsconference-collections
research-article

Scalable Algorithm for Probabilistic Overlapping Community Detection

Published: 10 February 2017 Publication History

Abstract

In the data mining field, community detection, which decomposes a graph into multiple subgraphs, is one of the major techniques to analyze graph data. In recent years, the scalability of the community detection algorithm has been a crucial issue because of the growing size of real-world networks such as the co-author network and web graph. In this paper, we propose a scalable overlapping community detection method by using the stochastic variational Bayesian training of latent Dirichlet allocation (LDA) models, which predicts sets of neighbor nodes with a community mixture distribution. In the experiment, we show that the proposed method is much faster than previous methods and is capable of detecting communities even in a huge network that contains 60 million nodes and 1.8 billion edges. Furthermore, we compared different mini-batch sizes and the number of iterations in stochastic variational Bayesian inference to determine an empirical trade-off between efficiency and quality of overlapping community detection.

References

[1]
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9:1981--2014, 2008.
[2]
C. Biemann. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proc. COLING, 2006.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[4]
Y. Cha and J. Cho. Social-network analysis using topic models. In Proc. SIGIR, 2012.
[5]
U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on youtube for topic discovery and exploration. In Proc. ICWSM, 2011.
[6]
P. K. Gopalan and D. M. Blei. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences of the United States of America, 110:14534--14539, 2013.
[7]
E. Gregori, L. Lenzini, and S. Mainardi. Parallel k-clique community detection on large-scale networks. IEEE Transactions on Parallel and Distributed Systems, 24:1651--1660, 2013.
[8]
A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proc. KDD, 2016.
[9]
S. Harenberg, G. Bello, L. Gjeltema, S. Ranshous, J. Harlalka, R. Seay, K. Padmanabhan, and N. Samatova. Community detection in large-scale networks: A survey and empirical evaluation. Wiley Interdisciplinary Reviews: Computational Statistics, 6:426--439, 2014.
[10]
K. Henderson and T. Eliassi-Rad. Applying latent dirichlet allocation to group discovery in large graphs. In Proc. SAC, 2009.
[11]
M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. Journal of Machine Learning Research, 14:1303--1347, 2013.
[12]
D. M. Mimno, M. D. Hoffman, and D. M. Blei. Sparse stochastic inference for latent dirichlet allocation. In Proc. ICML, 2012.
[13]
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.
[14]
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435:814--818, 2005.
[15]
S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Community detection in social media performance and application considerations. Data Mining and Knowledge Discovery, 24:515--554, 2012.
[16]
H. Shiokawa, Y. Fujiwara, and M. Onizuka. Scan++: Efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. VLDB Endowment, 8:1178--1189, 2015.
[17]
J. Tang, Z. Meng, X. Nguyen, Q. Mei, and M. Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In Proc. ICML, 2014.
[18]
E. Tomita, A. Tanaka, and H. Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363:28--42, 2006.
[19]
C. Wang and D. M. Blei. Truncation-free stochastic variational inference for bayesian nonparametric models. In Proc. NIPS, 2012.
[20]
J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys, 45:1--35, 2013.
[21]
J. Yang and J. Leskovec. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proc. WSDM, 2013.
[22]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proc. KDD, 2009.
[23]
H. Zhang, B. Qiu, C. L. Giles, H. C. Foley, and J. Yen. An lda-based community structure discovery approach for large-scale social networks. In Proc. ISI, 2007.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SWM '17: Proceedings of the 1st Workshop on Scholarly Web Mining
February 2017
65 pages
ISBN:9781450352406
DOI:10.1145/3057148
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • Oak Ridge National Laboratory
  • OU: The Open University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SWM '17
SWM '17: 1st Workshop on Scholarly Web Mining
February 10, 2017
Cambridge, United Kingdom

Acceptance Rates

SWM '17 Paper Acceptance Rate 8 of 17 submissions, 47%;
Overall Acceptance Rate 8 of 17 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 85
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media