skip to main content
10.1145/2556195.2556213acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

FENNEL: streaming graph partitioning for massive scale graphs

Published: 24 February 2014 Publication History

Abstract

Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors.
In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs.
Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 81/2 hours by METIS to produce a balanced partition that cuts 11.98% of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.

References

[1]
http://staffweb.cms.gre.ac.uk/~wc06/partition/.
[2]
A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB, 2012.
[3]
S. Arora, R. Satish, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. In STOC, pages 222--231, 2004.
[4]
D. Blandford, G. Blelloch, and I. Kash. An experimental analysis of a compact graph representation. In ALENEX, 2004.
[5]
F. R. K. Chung and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1):91--113, 2003.
[6]
A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. In RANDOM-APPROX, pages 221--232, 1999.
[7]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, January 2008.
[8]
Sergey N Dorogovtsev and Jose FF Mendes. Evolution of networks. Advances in physics, 51(4):1079--1187, 2002.
[9]
U. Feige and R. Krauthgamer. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput., 31(4), 2002.
[10]
M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np-complete problems. In STOC, 1974.
[11]
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821--7826, 2002.
[12]
J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[13]
U. Kang, C. E. T., and C. Faloutsos. Pegasus: A peta-scale graph mining system. In ICDM, 2009.
[14]
T. Karagiannis, C. Gkantsidis, D. Narayanan, and A. Rowstron. Hermes: Clustering users in large-scale e-mail services. In Cloud Computing, 2010.
[15]
G. Karypis and V. Kumar. Parallel multilevel graph partitioning. In IPPS, pages 314--319, 1996.
[16]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359--392, December 1998.
[17]
Q. Ke, V. Prabhakaran, Y. Xie, Y. Yu, J. Wu, and J. Yang. Optimizing data partitioning for data-parallel computing. In HotOS XIII, May 2011.
[18]
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J., 49(2):291--307, February 1970.
[19]
A. Konstantin and H. Räcke. Balanced graph partitioning. In SPAA '04, pages 120--124, 2004.
[20]
R. Krauthgamer, J. (S.) Naor, and R. Schwartz. Partitioning graphs into balanced components. In SODA '09, pages 942--949, 2009.
[21]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW '10, pages 591--600, New York, NY, USA, 2010. ACM.
[22]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. In UAI, pages 340--349, 2010.
[23]
G. Malewicz and et al. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010.
[24]
M. E. J. Newman. The structure and function of complex networks. SIAM review, 45(2):167--256, 2003.
[25]
J. Nishimura and J. Ugander. Restreaming graph partitioning: Simple versatile algorithms for advanced balancing. In ACM KDD, 2013.
[26]
V. Prabhakaran and et al. Managing large graphs on multi-cores with graph awareness. In USENIX ATC'12, 2012.
[27]
J. Pujol and et al. The little engine(s) that could: Scaling online social networks. In ACM SIGCOMM 2010, 2010.
[28]
K. Schloegel, G. Karypis, and V. Kumar. Parallel static and dynamic multi-constraint graph partitioning. Concurrency and Computation: Practice and Experience, 14(3):219--240, 2002.
[29]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In ACM KDD, pages 1222--1230, 2012.
[30]
C.E. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD'13, 2013.
[31]
Johan Ugander and Lars Backstrom. Balanced label propagation for partitioning massive graphs. In WSDM '13, pages 507--516, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. balanced graph partitioning
  2. distributed computing
  3. streaming

Qualifiers

  • Research-article

Conference

WSDM 2014

Acceptance Rates

WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)12
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media