skip to main content
10.5555/2387880.2387883acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

PowerGraph: distributed graph-parallel computation on natural graphs

Published: 08 October 2012 Publication History

Abstract

Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability.
In this paper, we characterize the challenges of computation on natural graphs in the context of existing graph-parallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains.

References

[1]
ABOU-RJEILI, A., AND KARYPIS, G. Multilevel algorithms for partitioning power-law graphs. In IPDPS (2006).
[2]
AHMED, A., ALY, M., GONZALEZ, J., NARAYANAMURTHY, S., AND SMOLA, A. J. Scalable inference in latent variable models. In WSDM (2012), pp. 123-132.
[3]
ALBERT, R., JEONG, H., AND BARABÁSI, A. L. Error and attack tolerance of complex networks. In Nature (2000), vol. 406, pp. 378-482.
[4]
BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and distributed computation: numerical methods. Prentice-Hall, 1989.
[5]
BOLDI, P., ROSA, M., SANTINI, M., AND VIGNA, S. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW (2011), pp. 587-596.
[6]
BOLDI, P., AND VIGNA, S. The WebGraph framework I: Compression techniques. In WWW (2004), pp. 595-601.
[7]
BORDINO, I., BOLDI, P., DONATO, D., SANTINI, M., AND VIGNA, S. Temporal evolution of the uk web. In ICDM Workshops (2008), pp. 909-918.
[8]
BULUÇ, A., AND GILBERT, J. R. The combinatorial blas: design, implementation, and applications. IJHPCA 25, 4 (2011), 496-509.
[9]
CATALYUREK, U., AND AYKANAT, C. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In IRREGULAR (1996), pp. 75-86.
[10]
CHANDY, K. M., AND MISRA, J. The drinking philosophers problem. ACM Trans. Program. Lang. Syst. 6, 4 (Oct. 1984), 632-646.
[11]
CHENG, R., HONG, J., KYROLA, A., MIAO, Y., WENG, X., WU, M., YANG, F., ZHOU, L., ZHAO, F., AND CHEN, E. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys (2012), pp. 85-98.
[12]
CHIERICHETTI, F., KUMAR, R., LATTANZI, S., MITZENMACHER, M., PANCONESI, A., AND RAGHAVAN, P. On compressing social networks. In KDD (2009), pp. 219-228.
[13]
DEVINE, K. D., BOMAN, E. G., HEAPHY, R. T., BISSELING, R. H., AND CATALYUREK, U. V. Parallel hypergraph partitioning for scientific computing. In IPDPS (2006).
[14]
DIJKSTRA, E. W. Hierarchical ordering of sequential processes. Acta Informatica 1 (1971), 115-138.
[15]
EKANAYAKE, J., LI, H., ZHANG, B., GUNARATHNE, T., BAE, S., QIU, J., AND FOX, G. Twister: A runtime for iterative MapReduce. In HPDC (2010), ACM.
[16]
FALOUTSOS, M., FALOUTSOS, P., AND FALOUTSOS, C. On power-law relationships of the internet topology. ACM SIGCOMM Computer Communication Review 29, 4 (1999), 251-262.
[17]
GONZALEZ, J., LOW, Y., GRETTON, A., AND GUESTRIN, C. Parallel gibbs sampling: From colored fields to thin junction trees. In AISTATS (2011), vol. 15, pp. 324-332.
[18]
GONZALEZ, J., LOW, Y., AND GUESTRIN, C. Residual splash for optimally parallelizing belief propagation. In AISTATS (2009), vol. 5, pp. 177-184.
[19]
GONZALEZ, J., LOW, Y., GUESTRIN, C., AND O'HALLARON, D. Distributed parallel inference on large factor graphs. In UAI (2009).
[20]
GREGOR, D., AND LUMSDAINE, A. The parallel BGL: A generic library for distributed graph computations. POOSC (2005).
[21]
HOFMANN, T. Probabilistic latent semantic indexing. In SIGIR (1999), pp. 50-57.
[22]
KANG, U., TSOURAKAKIS, C. E., AND FALOUTSOS, C. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM (2009), pp. 229-238.
[23]
KARYPIS, G., AND KUMAR, V. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 1 (1998), 96-129.
[24]
KWAK, H., LEE, C., PARK, H., AND MOON, S. What is twitter, a social network or a news media? In WWW (2010), pp. 591-600.
[25]
KYROLA, A., BLELLOCH, G., AND GUESTRIN, C. GraphChi: Large-scale graph computation on just a PC. In OSDI (2012).
[26]
LANG, K. Finding good nearly balanced cuts in power law graphs. Tech. Rep. YRL-2004-036, Yahoo! Research Labs, Nov. 2004.
[27]
LESKOVEC, J., KLEINBERG, J., AND FALOUTSOS, C. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1 (mar 2007).
[28]
LESKOVEC, J., LANG, K. J., DASGUPTA, A., AND MAHONEY, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2008), 29-123.
[29]
LOW, Y., GONZALEZ, J., KYROLA, A., BICKSON, D., GUESTRIN, C., AND HELLERSTEIN, J. M. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB (2012).
[30]
MALEWICZ, G., AUSTERN, M. H., BIK, A. J., DEHNERT, J., HORN, I., LEISER, N., AND CZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In SIGMOD (2010).
[31]
PELLEGRINI, F., AND ROMAN, J. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In HPCN Europe (1996), pp. 493-498.
[32]
POWER, R., AND LI, J. Piccolo: building fast, distributed programs with partitioned tables. In OSDI (2010).
[33]
PUJOL, J. M., ERRAMILLI, V., SIGANOS, G., YANG, X., LAOUTARIS, N., CHHABRA, P., AND RODRIGUEZ, P. The little engine(s) that could: scaling online social networks. In SIGCOMM (2010), pp. 375-386.
[34]
SMOLA, A. J., AND NARAYANAMURTHY, S. An Architecture for Parallel Topic Models. PVLDB 3, 1 (2010), 703-710.
[35]
STANTON, I., AND KLIOT, G. Streaming graph partitioning for large distributed graphs. Tech. Rep. MSR-TR-2011-121, Microsoft Research, November 2011.
[36]
SURI, S., AND VASSILVITSKII, S. Counting triangles and the curse of the last reducer. In WWW (2011), pp. 607-614.
[37]
ZAHARIA, M., CHOWDHURY, M., FRANKLIN, M. J., SHENKER, S., AND STOICA, I. Spark: Cluster computing with working sets. In HotCloud (2010).
[38]
ZHOU, Y., WILKINSON, D., SCHREIBER, R., AND PAN, R. Large-scale parallel collaborative filtering for the netflix prize. In AAIM (2008), pp. 337-348.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
OSDI'12: Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
October 2012
362 pages
ISBN:9781931971966

Sponsors

  • Infosys
  • EMC2: EMC2
  • Microsoft Reasearch: Microsoft Reasearch
  • ORACLE: ORACLE
  • USENIX Assoc: USENIX Assoc

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 08 October 2012

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media