skip to main content
10.1145/3327964.3328498acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Cut to Fit: Tailoring the Partitioning to the Computation

Published: 30 June 2019 Publication History

Abstract

Graph analytics applications are very often built using off-the-shelf analytics frameworks, which are profiled and optimized for the general case and have to perform for all kind of graphs. As performance is affected by the selection of the partition strategy analytics frameworks often offer a selection of partitioning algorithms. In this paper we evaluate the impact of partitioning strategies on the performance of graph computations. We evaluate eight graph partitioning algorithms on a diverse set of graph datasets, using four standard graph algorithms by measuring a set of five partitioning metrics.
We analyze the performance of each partitioning strategy with respect to (i) the properties of the graph dataset, (ii) each analytics computation and (iii) the number of partitions. We confirm that there is no optimal partitioner across all experiments and moreover, find no metric always correlated with performance, that could be targeted by novel partitioners. Finally, we find that partitioning time may become a significant part of total time, and investing a lot of time to approximate perfect partitioning may not be worth it. We propose that a "good enough" strategy may be to have a very fast (and locally computable) heuristic to select among the best performing partitioners for any given problem instance. We demonstrate this by proposing PARSEL, a very simple partitioner selector. PARSEL selects among the two best-performing partitioners very fast; we demonstrate that even such a simple heuristic can outperform either partitioning strategy alone.

References

[1]
Apache giraph. http://giraph.apache.org/.
[2]
A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS'06, pages 124--124. IEEE Computer Society, Apr. 2006. http://dl.acm.org/citation.cfm?id=1898953.1899055.
[3]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'06, pages 44--54. ACM, Aug. 2006.
[4]
A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment, 8(12):1804--1815, Aug. 2015.
[5]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings on the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 17--30. USENIX Association, Oct. 2012. http://dl.acm.org/citation.cfm?id=2387880.2387883.
[6]
M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. Proceedings of the VLDB Endowment, 7(12):1047--1058, Aug. 2014.
[7]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359--392, Dec. 1998.
[8]
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 169--182. ACM, Apr. 2013.
[9]
R. Kumar, A. Abelló, and T. Calders. Cost model for pregel on graphx. In Advances in Databases and Information Systems ADBIS'17, volume 10509 of Lecture Notes in Computer Science, pages 153--166. Springer, Cham, Aug. 2017.
[10]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6:29--123, Jan. 2009.
[11]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, Apr. 2012.
[12]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD'10, pages 135--146. ACM, June 2010.
[13]
H. Mykhailenko, F. Huet, and G. Neglia. Comparison of edge partitioners for graph processing. In International Conference on Computational Science and Computational Intelligence, CSCI'16, pages 441--446. IEEE, Dec. 2016.
[14]
H. Mykhailenko, G. Neglia, and F. Huet. Which metrics for vertex-cut partitioning? In 11th International Conference for Internet Technology and Secured Transactions, ICITST'16, pages 74--79. IEEE, Dec. 2016.
[15]
P. Pratikakis. twawler: A lightweight twitter crawler. CoRR, abs/1804.07748, Aug. 2018. http://arxiv.org/abs/1804.07748.
[16]
F. Rahimian, A. H. Payberah, S. Girdzijauskas, and S. Haridi. Distributed vertex-cut partitioning. In Proceedings of the 14th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems, volume 8460, pages 186--200. Springer-Verlag, June 2014.
[17]
R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI'15, pages 4292--4293. AAAI Press, Jan. 2015. http://dl.acm.org/citation.cfm?id=2888116.2888372.
[18]
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out graph processing from secondary storage. In Symposium on Operating Systems Principles, SOSP'15, pages 410--424. ACM, Oct. 2015.
[19]
S. Salihoglu and J. Widom. Gps: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, pages 22:1--22:12. ACM, July 2013.
[20]
N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In Management of Data, SIGMOD'14, pages 979--990. ACM, June 2014.
[21]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'12, pages 1222--1230. ACM, Aug. 2012.
[22]
J. Sun, H. Vandierendonck, and D. S. Nikolopoulos. Graphgrind: Addressing load imbalance of graph partitioning. In Proceedings of the International Conference on Supercomputing, ICS'17, pages 16:1--16:10. ACM, June 2017.
[23]
L. Takac and M. Zabovsky. Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations 2012, volume 1, May 2012.
[24]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM'14, pages 333--342. ACM, Feb. 2014.
[25]
L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, Aug. 1990.
[26]
S. Verma, L. M. Leslie, Y. Shin, and I. Gupta. An experimental comparison of partitioning strategies in distributed graph processing. Proceedings of the VLDB Endowment, 10(5):493--504, Jan. 2017.
[27]
R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In Graph Data Management Experiences and Systems, GRADES'13, pages 2:1--2:6. ACM, June 2013.
[28]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181--213, Jan. 2015.
[29]
L. Yucheng, G. Joseph, K. Aapo, B. Danny, G. Carlos, and M. Joseph. GraphLab: A new framework for parallel machine learning. In Conference on Uncertainty in Artificial Intelligence, UAI'10, pages 340--349. AUAI Press, July 2010. https://dslpitt.org/uai/displayArticles.jsp?mmnu=1&smnu=1&proceeding_id=26.

Cited By

View all
  1. Cut to Fit: Tailoring the Partitioning to the Computation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GRADES-NDA'19: Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
    June 2019
    88 pages
    ISBN:9781450367899
    DOI:10.1145/3327964
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Evaluation
    2. GraphX
    3. Partitioning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SIGMOD/PODS '19
    Sponsor:
    SIGMOD/PODS '19: International Conference on Management of Data
    June 30 - July 5, 2019
    Amsterdam, Netherlands

    Acceptance Rates

    GRADES-NDA'19 Paper Acceptance Rate 10 of 20 submissions, 50%;
    Overall Acceptance Rate 29 of 61 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media