skip to main content
research-article
Public Access

Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis

Published: 06 March 2017 Publication History

Abstract

Community detection is an increasingly popular approach to uncover important structures in large networks. Flow-based community detection methods rely on communication patterns of the network rather than structural properties to determine communities. The Infomap algorithm in particular optimizes a novel objective function called the map equation and has been shown to outperform other approaches in third-party benchmarks. However, Infomap and its variants are inherently sequential, limiting their use for large-scale graphs.
In this article, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on eight cores, and prioritization improves algorithm performance by an additional 20--50% on average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.

References

[1]
Rodrigo Aldecoa and Ignacio Marín. 2013. Exploring the limits of community detection strategies in complex networks. Scientific Reports 3, 2216 (2013).
[2]
Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe. 2013. Scalable flow-based community detection for large-scale network analysis. In Proceedings of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW’13). IEEE, 303--310.
[3]
Sanjukta Bhowmick and Sriram Srinivasan. 2013. A template for parallelizing the Louvain method for modularity maximization. In Dynamics On and Of Complex Networks, Volume 2. Animesh Mukherjee, Monojit Choudhury, Fernando Peruani, Niloy Ganguly, and Bivas Mitra (Eds.). Springer, New York, NY, 111--124.
[4]
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.
[5]
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW’04). ACM, Manhattan, 595--601.
[6]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117.
[7]
Aaron Clauset, Mark E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 70, 6 (2004), 066111.
[8]
Anne Condon and Richard M. Karp. 2001. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms 18, 2 (2001), 116--140.
[9]
Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. 2005. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005, 9 (2005), P09008.
[10]
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, 810--818.
[11]
Bas Fagginger Auer and Rob H. Bisseling. 2013. Graph coarsening and clustering on the GPU. In Graph Partitioning and Graph Clustering: 10th DIMACS Implementation Challenge Workshop, vol. 588. American Mathematical Society.
[12]
Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 1 (Jan. 2007), 36--41.
[13]
Anne-Claude Gavin, Patrick Aloy, Paola Grandi, Roland Krause, Markus Boesche, Martina Marzioch, Christina Rau, Lars Juhl Jensen, Sonja Bastuck, Birgit Dümpelfeld, and others. 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 7084 (2006), 631--636.
[14]
Michelle Girvan and Mark E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.
[15]
David F. Gleich and Leonid Zhukov. 2005. Scalable computing with power-law graphs: Experience with parallel PageRank. In Proceedings of 2005 IEEE SuperComputing Conference (Poster).
[16]
Roger Guimera and Luis A. Nunes Amaral. 2005. Functional cartography of complex metabolic networks. Nature 433, 7028 (2005), 895--900.
[17]
Roger Guimerà, Marta Sales-Pardo, and L. A. N. Amaral. 2004. Modularity from fluctuations in random graphs and complex networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 70, 2 (2004), 025101(R).
[18]
Tatsuro Kawamoto and Martin Rosvall. 2015. Estimating the resolution limit of the map equation in community detection. Phys. Rev. E 91, 1 (Jan. 2015), 012809.
[19]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 591--600.
[20]
Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 80 (2009), 056117.
[21]
Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 78, 4 (2008), 046110.
[22]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (2014). Retrieved June 2014 from http://snap.stanford.edu/data.
[23]
Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 631--640.
[24]
Xin Liu and Tsuyoshi Murata. 2010. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. Physica A: Statistical Mechanics and its Applications 389, 7 (2010), 1493--1500.
[25]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the VLDB Endowment 5, 8 (Apr. 2012), 716--727. http://dl.acm.org/citation. cfm?id=2212351.2212354
[26]
Padmanabhan K. Menon, Gregory D. Sweriduk, and Karl D. Bilimoria. 2004. New approach for modeling, analysis, and control of air traffic flow. Journal of Guidance, Control, and Dynamics 27, 5 (2004), 737--744.
[27]
Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 69, 2 (2004), 026113.
[28]
Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J. Wright. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS. http://books.nips.cc/papers/files/nips24/NIPS2011_ 0485.pdf.
[29]
OpenMP Architecture Review Board. 2008. OpenMP Application Program Interface Version 3.0. (2008). Retrieved May 2008 from http://www.openmp.org/mp-documents/spec30.pdf.
[30]
E. Jason Riedy, Henning Meyerhenke, David Ediger, and David A Bader. 2011. Parallel community detection for massive graphs. In Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics -- Volume Part I. Springer-Verlag, Berlin Heidelberg, 286--296.
[31]
Jason Riedy, David A. Bader, and Henning Meyerhenke. 2012. Scalable multi-threaded community detection in social networks. In Proceedings of the 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’12). IEEE, 1619--1628.
[32]
Martin Rosvall, Daniel Axelsson, and Carl T. Bergstrom. 2009. The map equation. The European Physical Journal Special Topics 178, 1 (2009), 13--23.
[33]
Martin Rosvall and Carl T. Bergstrom. 2011. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS One 6, 4 (2011), e18209.
[34]
Martin Rosvall, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte. 2014. Memory in network flows and its effects on spreading dynamics and community detection. Nature Communications 5 (2014).
[35]
Claude Elwood Shannon. 1948a. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (Jul. 1948), 379--423.
[36]
Claude Elwood Shannon. 1948b. A mathematical theory of communication. The Bell System Technical Journal 27, 4 (Oct. 1948), 623--656.
[37]
Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast algorithm for modularity-based graph clustering. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1170--1176.
[38]
C. Staudt and H. Meyerhenke. 2015. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems, 27, 1 (2016), 171--184.
[39]
Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. NetworKit: A Tool Suite for Large-scale Complex Network Analysis. arXiv:1403.3005, https://arxiv.org/abs/1403.3005.
[40]
Jevin D. West, Theodore C. Bergstrom, and Carl T. Bergstrom. 2010. The eigenfactor metrics<sup>TM</sup>: A network approach to assessing scholarly journals. College 8 Research Libraries 71, 3 (2010), 236--244.
[41]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2.
[42]
Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. Priter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 13.
[43]
Yuzhou Zhang, Jianyong Wang, Yi Wang, and Lizhu Zhou. 2009. Parallel community detection on large networks with propinquity dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 997--1006.

Cited By

View all

Index Terms

  1. Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 3
      August 2017
      372 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3058790
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 March 2017
      Accepted: 01 August 2016
      Revised: 01 June 2016
      Received: 01 December 2014
      Published in TKDD Volume 11, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Community detection
      2. graph analysis
      3. parallelization
      4. prioritization

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)222
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 27 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media