research-article

PGX.D: a fast distributed graph processing engine

Authors:

Siegfried Depner,

Thomas Manhardt,

Jan Van Der Lugt,

Merijn Verstraaten,

Hassan ChafiAuthors Info & Claims

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 58, Pages 1 - 12

https://doi.org/10.1145/2807591.2807620

Published: 15 November 2015 Publication History

Abstract

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x -- 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.

References

[1]

Apache Giraph Project. http://giraph.apache.org.

[2]

Koblenz Network Collection. http://konect.uni-koblenz.de.

[3]

Neo4j graph database. http://www.neo4j.org/.

[4]

NetworkX. https://networkx.github.io.

[5]

SNAP. http://snap.stanford.edu/data/.

[6]

Yahoo! Labs Datasets. http://webscope.sandbox.yahoo.com/.

[7]

Atul Adya, Jon Howell, Marvin Theimer, William J Bolosky, and John R Douceur. Cooperative task management without manual stack management. In USENIX Annual Technical Conference (ATEC), pages 289--302, 2002.

Digital Library

[8]

Sutanay Choudhury, Lawrence Holder, George Chin, Khushbu Agarwal, and John Feo. A selectivity based approach to continuous pattern detection in streaming graphs. arXiv preprint arXiv:1503.00849, 2015.

[9]

David Ediger, Robert McColl, Jason Riedy, and David A Bader. Stinger: High performance data structure for streaming graphs. In High Performance Extreme Computing (HPEC), pages 1--5, 2012.

[10]

Jing Fan, Adalbert Gerald Soosai Raj, and Jinnesh M. Patel. A case against specialized graph analytics engines. In 7th Biennial Conference on Innovative Data Systems Research (CIDR), 2015.

[11]

Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. Graphx: Graph processing in a distributed dataflow framework. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.

Digital Library

[12]

Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 289--300. ACM, 2014.

Digital Library

[13]

Apache Hadoop. http://hadoop.apache.org/.

[14]

Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. Turbo iso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proc. of the 2013 ACM SIGMOD International Conference on Management of Data.

Digital Library

[15]

Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-Marl: A DSL for Easy and Efficient Graph Analysis. In ASPLOS. ACM, 2012.

Digital Library

[16]

Sungpack Hong, Semih Salihoglu, Jennifer Widom, and Kunle Olukotun. Simplifying scalable graph processing with a domain-specific language. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 208--218, 2014.

Digital Library

[17]

Jin Huang, Rui Zhang, and Jeffrey Xu Yu. Technical report: Hyperx a framework for scalable hypergraph learning. 2015.

[18]

U Kang, Charalampos E Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In IEEE International Conference on Data Mining (ICDM), pages 229--238, 2009.

Digital Library

[19]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th international conference on World wide web, pages 591--600. ACM, 2010.

Digital Library

[20]

Willis Lang, Stavros Harizopoulos, Jignesh M Patel, Mehul A Shah, and Dimitris Tsirogiannis. Towards energy-efficient database cluster design. Proceedings of the VLDB Endowment, 5(11):1684--1695, 2012.

Digital Library

[21]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012.

Digital Library

[22]

Grzegorz Malewicz, Matthew H. Austern, Aart J. C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD '10, pages 135--146. ACM.

Digital Library

[23]

Robert Campbell McColl, David Ediger, Jason Poovey, Dan Campbell, and David A Bader. A performance evaluation of open source graph databases. In Proceedings of the first workshop on PPAA. ACM, 2014.

Digital Library

[24]

Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Grappa: A latency-tolerant runtime for large-scale irregular applications. Technical report, Technical report, University of Washington, 2014. URL http://sampa. cs. washington. edu/papers/grappa-tr-2014-02. pdf. 4.1, 2014.

[25]

Donald Nguyen, Andrew Lenharth, and Keshav Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 456--471. ACM, 2013.

Digital Library

[26]

Roger Pearce, Maya Gokhale, and Nancy M Amato. Faster parallel traversal of scale free graphs at extreme scale with vertex delegates. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 549--559, 2014.

Digital Library

[27]

Raghavan Raman, Oskar van Rest, Sungpack Hong, Zhe Wu, Hassan Chafi, and Jay Banerjee. Pgx. iso: Parallel and efficient in-memory engine for subgraph isomorphism. In Proceedings of Workshop on GRAph Data management Experiences and Systems.

Digital Library

[28]

Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD International Conference on Management of Data, pages 979--990, 2014.

Digital Library

[29]

Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow., 6(14):1906--1917, September 2013.

Digital Library

[30]

Adam Welc, Raghavan Raman, Zhe Wu, Sungpack Hong, Hassan Chafi, and Jay Banerjee. Graph analysis: do we have to reinvent the wheel? In First International Workshop on Graph Data Management Experiences and Systems, page 7. ACM, 2013.

Digital Library

[31]

Jeremiah James Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, and Andrew Lumsdaine. Active pebbles: A programming model for highly parallel fine-grained data-driven computations. In ACM SIGPLAN Notices, volume 46, pages 305--306. ACM, 2011.

Digital Library

[32]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[33]

Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. A distributed graph engine for web scale rdf data. Proceedings of the VLDB Endowment, 6(4):265--276, 2013.

Digital Library

Cited By

Han DZhu HChen WPan RLiu YZhou JFeng HZhang TWang XZhu MTao JFan CZhang X(2024)GraphFederator: Federated Visual Analysis for Multi-party Graphs2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00027(172-181)Online publication date: 23-Apr-2024
https://doi.org/10.1109/PacificVis60374.2024.00027
Firmli SChiadmi D(2023)A Scalable Data Structure for Efficient Graph Analytics and In-Place MutationsData10.3390/data81101668:11(166)Online publication date: 3-Nov-2023
https://doi.org/10.3390/data8110166
Besta MGerstenberger RPeter EFischer MPodstawski MBarthels CAlonso GHoefler T(2023)Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph QueriesACM Computing Surveys10.1145/360493256:2(1-40)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3604932
Show More Cited By

Index Terms

PGX.D: a fast distributed graph processing engine

Recommendations

PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine
GRADES'17: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems

Graph querying and pattern matching is becoming an important feature of graph processing as it allows data analysts to easily collect and understand information about their graphs in a way similar to SQL for databases. One of the key challenges in graph ...
On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2015

985 pages

ISBN:9781450337236

DOI:10.1145/2807591

General Chair:
Jackie Kern
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Jeffrey S. Vetter
Oak Ridge National Laboratory and Georgia Institute of Technology, Oak Ridge, Tennessee

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15 - 20, 2015

Texas, Austin

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

75
Total Citations
View Citations
962
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)6

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Han DZhu HChen WPan RLiu YZhou JFeng HZhang TWang XZhu MTao JFan CZhang X(2024)GraphFederator: Federated Visual Analysis for Multi-party Graphs2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00027(172-181)Online publication date: 23-Apr-2024
https://doi.org/10.1109/PacificVis60374.2024.00027
Firmli SChiadmi D(2023)A Scalable Data Structure for Efficient Graph Analytics and In-Place MutationsData10.3390/data81101668:11(166)Online publication date: 3-Nov-2023
https://doi.org/10.3390/data8110166
Besta MGerstenberger RPeter EFischer MPodstawski MBarthels CAlonso GHoefler T(2023)Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph QueriesACM Computing Surveys10.1145/360493256:2(1-40)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3604932
Jamshidi KXu HVora KFedorova ANarayanan DDi Luna GQuerzoni L(2023)Accelerating Graph Mining Systems with Subgraph MorphingProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567489(162-181)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3552326.3567489
Zhang CBonifati AKapp HHaprian VLozi J(2023)A Reachability Index for Recursive Label-Concatenated Graph Queries2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00013(67-81)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00013
Berdai AChiadmi D(2023)Attempts in Worst-Case Optimal Joins on Relational Data Systems: A Literature Survey2023 IEEE 6th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)10.1109/CloudTech58737.2023.10366077(01-08)Online publication date: 21-Nov-2023
https://doi.org/10.1109/CloudTech58737.2023.10366077
Boukham HWachsmuth GDwars MChiadmi DFischer BBurgueño LCazzola W(2022)A Multi-target, Multi-paradigm DSL Compiler for Algorithmic Graph ProcessingProceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3567512.3567513(2-15)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3567512.3567513
Jamshidi KMariappan MVora KKalavri VSalihoğlu S(2022)Anti-vertex for neighborhood constraints in subgraph queriesProceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3534540.3534690(1-9)Online publication date: 12-Jun-2022
https://dl.acm.org/doi/10.1145/3534540.3534690
Liu TLi D(2022)EndGraph: An Efficient Distributed Graph Preprocessing System2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00020(111-121)Online publication date: Jul-2022
https://doi.org/10.1109/ICDCS54860.2022.00020
Jamshidi KVora K(2021)A Deeper Dive into Pattern-Aware Subgraph Exploration with PEREGRINEACM SIGOPS Operating Systems Review10.1145/3469379.346938155:1(1-10)Online publication date: 6-Jun-2021
https://dl.acm.org/doi/10.1145/3469379.3469381
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents