skip to main content
10.1145/2835776.2835829acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Distributed Balanced Partitioning via Linear Embedding

Published: 08 February 2016 Publication History

Abstract

Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems: in some cases, a big graph is chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size.
We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces, minimizing the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks, e.g., MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques such as local swaps, minimum cuts on partition boundaries, as well as contraction and dynamic programming.
Our empirical study compares the above techniques with each other, and to previous work in distributed algorithms, e.g., a label propagation method, FENNEL and Spinner. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: we notice, e.g., 15-25% reduction in cut size over [UB13]. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an ≈ 40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method.

References

[1]
Public data for balanced partioning paper. http://goo.gl/okvwpa, 2015.
[2]
Andreev, K., and Räcke, H. Balanced graph partitioning. Theory Comput. Syst. 39, 6 (2006), 929--939.
[3]
Aydin, K., Bateni, M., and Mirrokni, V. Distributed balanced partitioning via linear embedding. CoRR abs/1512.02727 (2015).
[4]
Boldi, P., Rosa, M., Santini, M., and Vigna, S. Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In WWW (2011), pp. 587--596.
[5]
Charikar, M., Hajiaghayi, M. T., Karloff, H. J., and Rao, S. l22 spreading metrics for vertex ordering problems. Algorithmica 56, 4 (2010), 577--604.
[6]
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., and Raghavan, P. On compressing social networks. In KDD (2009), pp. 219--228.
[7]
Ching, A., and Kunz, C. Giraph : Large-scale graph processing on hadoop. In Hadoop Summit (2010).
[8]
Dean, J., and Ghemawat, S. Mapreduce: Simplified data processing on large clusters. In OSDI (2004), pp. 137--150.
[9]
Delling, D., Goldberg, A. V., Razenshteyn, I., and Werneck, R. F. F. Graph partitioning with natural cuts. In IPDPS (2011), pp. 1135--1146.
[10]
Delling, D., Goldberg, A. V., Razenshteyn, I., and Werneck, R. F. F. Exact combinatorial branch-and-bound for graph bisection. In ALENEX (2012), pp. 30--44.
[11]
Dongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy, K., Torczon, L., and White, A. The Sourcebook of Parallel Computing. Morgan Kaufmann Publishers Inc., 2003.
[12]
Feige, U., and Krauthgamer, R. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput. 31, 4 (2002), 1090--1118.
[13]
Feige, U., and Lee, J. R. An improved approximation ratio for the minimum linear arrangement problem. Inf. Process. Lett. 101, 1 (2007), 26--29.
[14]
Fishburn, P. C., Tetali, P., and Winkler, P. Optimal linear arrangement of a rectangular grid. Discrete Mathematics 213, 1-3 (2000), 123--139.
[15]
Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
[16]
Goldschmidt, O., and Hochbaum, D. S. Polynomial algorithm for the k-cut problem. In FOCS (1988), pp. 444--451.
[17]
Gotsman, C., and Lindenbaum, M. On the metric properties of discrete space-filling curves. IEEE Transactions on Image Processing 5, 5 (1996), 794--797.
[18]
Kang, U., Tsourakakis, C. E., and Faloutsos, C. PEGASUS: A Peta-Scale Graph Mining System- Implementation and Observations.
[19]
Karypis, G., and Kumar, V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1 (1998), 359--392.
[20]
Kiveris, R., Lattanzi, S., Mirrokni, V., Rastogi, V., and Vassilvitskii, S. Connected components in mapreduce and beyond. In SOCC (2014).
[21]
Kwak, H., Lee, C., Park, H., and Moon, S. What is Twitter, a social network or a news media? In WWW (2010), pp. 591--600.
[22]
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In SIGMOD (2010).
[23]
Martella, C., Logothetis, D., and Siganos, G. Spinner: Scalable graph partitioning for the cloud. CoRR abs/1404.3861 (2014).
[24]
Moon, B., Jagadish, H. V., Faloutsos, C., and Saltz, J. H. Analysis of the clustering properties of the hilbert space-filling curve. IEEE TKDE 13 (2001).
[25]
Niedermeier, R., Reinhardt, K., and Sanders, P. Towards optimal locality in mesh-indexings. Discrete Applied Mathematics 117, 1-3 (2002), 211--237.
[26]
Rao, S., and Richa, A. W. New approximation techniques for some linear ordering problems. SIAM J. Comput. 34, 2 (2004), 388--404.
[27]
Rastogi, V., Machanavajjhala, A., Chitnis, L., and Sarma, A. D. Finding connected components in map-reduce in logarithmic rounds. In ICDE (2013), pp. 50--61.
[28]
Sokal, R. R., and Michener, C. D. A statistical method for evaluating systematic relationships. U. Kansas Science Bulletin 38 (1958), 1409--1438.
[29]
Stanton, I. Streaming balanced graph partitioning algorithms for random graphs. In SODA (2014), pp. 1287--1301.
[30]
Stanton, I., and Kliot, G. Streaming graph partitioning for large distributed graphs. In KDD (2012), pp. 1222--1230.
[31]
Strongin, R. G., and Sergeyev, Y. D. Global optimization with non-convex constraints: sequential and parallel algorithms. Nonconvex Optimization and Its Applications. Kluwer academic publishers, 2000.
[32]
Tsourakakis, C. E., Gkantsidis, C., Radunovic, B., and Vojnovic, M. FENNEL: streaming graph partitioning for massive scale graphs. In WSDM (2014), pp. 333--342.
[33]
Tsourakakis, C. E., Kolountzakis, M. N., and Miller, G. L. Approximate triangle counting. CoRR abs/0904.3761 (2009).
[34]
Ugander, J., and Backstrom, L. Balanced label propagation for partitioning massive graphs. In WSDM (2013), pp. 507--516.
[35]
Yan, D., Cheng, J., Lu, Y., and Ng, W. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB 7, 14 (2014), 1981--1992.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
February 2016
746 pages
ISBN:9781450337168
DOI:10.1145/2835776
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Check for updates

Author Tags

  1. cut minimization
  2. embedding to line
  3. imbalance
  4. local improvement
  5. mapreduce
  6. maps
  7. partitioning
  8. social networks

Qualifiers

  • Research-article

Conference

WSDM 2016
WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
February 22 - 25, 2016
California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)200
  • Downloads (Last 6 weeks)25
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media