skip to main content
10.1145/3078597.3078606acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases

Published: 26 June 2017 Publication History

Abstract

Graphs have become increasingly important in many applications and domains such as querying relationships in social networks or managing rich metadata generated in scientific computing. Many of these use cases require high-performance distributed graph databases for serving continuous updates from clients and, at the same time, answering complex queries regarding the current graph. These operations in graph databases, also referred to as online transaction processing (OLTP) operations, have specific design and implementation requirements for graph partitioning algorithms. In this research, we argue it is necessary to consider the connectivity and the vertex degree changes during graph partitioning. Based on this idea, we designed an Incremental Online Graph Partitioning (IOGP) algorithm that responds accordingly to the incremental changes of vertex degree. IOGP helps achieve better locality, generate balanced partitions, and increase the parallelism for accessing high-degree vertices of the graph. Over both real-world and synthetic graphs, IOGP demonstrates as much as 2x better query performance with a less than 10% overhead when compared against state-of-the-art graph partitioning algorithms.

References

[1]
S. T. Barnard. PMRSB: Parallel Multilevel Recursive Spectral Bisection. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing.
[2]
P. J. Carrington, J. Scott, and S. Wasserman. Models and methods in social network analysis, volume 28. Cambridge university press, 2005.
[3]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A Recursive Model for Graph Mining. In Proceedings of the 2004 SIAM International Conference on Data Mining, volume 4, pages 442--446. SIAM, 2004.
[4]
C. Chevalier and F. Pellegrini. Pt-scotch: A tool for efficient parallel graph ordering. Parallel computing, 34(6):318--331, 2008.
[5]
CloudLab. https://www.cloudlab.us/, 2017.
[6]
D. Dai, P. Carns, R. B. Ross, J. Jenkins, K. Blauer, and Y. Chen. Graphtrek: Asynchronous graph traversal for property graph-based metadata management. In 2015 IEEE International Conference on Cluster Computing, pages 284--293. IEEE, 2015.
[7]
D. Dai, Y. Chen, P. Carns, J. Jenkins, W. Zhang, and R. Ross. GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata. In Cluster Computing (CLUSTER), 2016 IEEE International Conference on, pages 298--307. IEEE, 2016.
[8]
D. Dai, R. B. Ross, P. Carns, D. Kimpe, and Y. Chen. Using property graphs for rich metadata management in hpc systems. In Parallel Data Storage Workshop (PDSW), 2014 9th, pages 7--12. IEEE, 2014.
[9]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store, 2007.
[10]
DEX. DEX. http://www.sparsity-technologies.com/, 2017.
[11]
D. Ediger, J. Riedy, D. A. Bader, and H. Meyerhenke. Tracking structure of streaming social networks. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 1691--1699. IEEE, 2011.
[12]
M. R. Garey and D. S. Johnson. Computers and intractability, volume 29. wh freeman New York, 2002.
[13]
M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np-complete problems. In Proceedings of the sixth annual ACM symposium on Theory of computing, pages 47--63. ACM, 1974.
[14]
B. Hendrickson and R. Leland. The chaco user's guide: Version 2.0. Technical report, Technical Report SAND95--2344, Sandia National Laboratories, 1995.
[15]
J. Huang and D. J. Abadi. Leopard: lightweight edge-oriented partitioning and replication for dynamic graphs. Proceedings of the VLDB Endowment, 9(7):540--551, 2016.
[16]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing, 20(1):359--392, 1998.
[17]
G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. Journal of Parallel and Distributed Computing, 48(1):71--95, 1998.
[18]
P. Kumar and H. H. Huang. G-store: high-performance graph store for trillion-edge processing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 71. IEEE Press, 2016.
[19]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[20]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010.
[21]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray User's Group (CUG), 2010.
[22]
J. Nishimura and J. Ugander. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD, pages 1106--1114. ACM, 2013.
[23]
OrientDB. http://www.orientechnologies.com/orient-db.htm, 2017.
[24]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.
[25]
F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In International Conference on High-Performance Computing and Networking. Springer, 1996.
[26]
RocksDB. http://rocksdb.org/, 2017.
[27]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013.
[28]
K. Schloegel, G. Karypis, and V. Kumar. Multilevel diffusion schemes for repartitioning of adaptive meshes. Journal of Parallel and Distributed Computing, 47(2):109--124, 1997.
[29]
SimpleGdb. https://github.com/daidong/simplegdb-Java, 2017.
[30]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012.
[31]
Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment, 7(3):193--204, 2013.
[32]
Titan. http://thinkaurelius.github.io/titan/, 2017.
[33]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 333--342. ACM, 2014.
[34]
J. Ugander and L. Backstrom. Balanced label propagation for partitioning massive graphs. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
[35]
L. M. Vaquero, F. Cuadrado, D. Logothetis, and C. Martella. Adaptive partitioning for large-scale dynamic graphs. In Distributed Computing Systems (ICDCS), 2014 IEEE 34th International Conference on, pages 144--153. IEEE, 2014.
[36]
J. Webber. A programmatic introduction to neo4j. In Proceedings of the 3rd annual conference on Systems, Programming, and Applications: Software for Humanity, pages 217--218. ACM, 2012.
[37]
R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems.
[38]
Y. Zhou, L. Liu, S. Seshadri, and L. Chiu. Analyzing enterprise storage workloads with graph modeling and clustering. IEEE Journal on Selected Areas in Communications, 34(3):551--574, 2016.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
June 2017
254 pages
ISBN:9781450346993
DOI:10.1145/3078597
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed storage
  2. graph database
  3. graph partitioning
  4. oltp

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '17
Sponsor:

Acceptance Rates

HPDC '17 Paper Acceptance Rate 19 of 100 submissions, 19%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)190
  • Downloads (Last 6 weeks)36
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media