skip to main content
research-article

Distributed subgraph matching on timely dataflow

Published: 01 June 2019 Publication History

Abstract

Recently there emerge many distributed algorithms that aim at solving subgraph matching at scale. Existing algorithm-level comparisons failed to provide a systematic view of distributed subgraph matching mainly due to the intertwining of strategy and optimization. In this paper, we identify four strategies and three general-purpose optimizations from representative state-of-the-art algorithms. We implement the four strategies with the optimizations based on the common Timely dataflow system for systematic strategy-level comparison. Our implementation covers all representative algorithms. We conduct extensive experiments for both unlabelled matching and labelled matching to analyze the performance of distributed subgraph matching under various settings, which is finally summarized as a practical guide.

References

[1]
The challenge9 datasets. http://www.dis.uniroma1.it/challenge9.
[2]
The clubweb12 dataset. https://lemurproject.org/clueweb12.
[3]
Compressed sparse row. https://en.wikipedia.org/wiki/Sparse_matrix.
[4]
Distributed subgraph matching on timely dataflow -the full paper. https://goo.gl/zkTkL4.
[5]
Giraph. http://giraph.apache.org/.
[6]
The implementation of bigjoin. https://github.com/frankmcsherry/dataflow-join/.
[7]
Ldbc benchmarks. http://ldbcouncil.org/benchmarks.
[8]
The ldbc social network benchmark. https://ldbc.github.io/ldbc_snb_docs/ldbc-snb-specification.pdf.
[9]
The open-sourced timely dataflow system. https://github.com/frankmcsherry/timely-dataflow.
[10]
The snap datasets. http://snap.stanford.edu/data/index.html.
[11]
The webgraph datasets. http://law.di.unimi.it/datasets.php.
[12]
C. R. Aberger, S. Tu, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. SIGMOD '16, pages 431--446.
[13]
F. N. Afrati, D. Fotakis, and J. D. Ullman. Enumerating subgraph instances using map-reduce. In Proc. of ICDE'13, 2013.
[14]
K. Ammar, F. McSherry, S. Salihoglu, and M. Joglekar. Distributed evaluation of subgraph queries using worst-case optimal low-memory dataflows. PVLDB, 11(6):691--704, 2018.
[15]
F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang. Efficient subgraph matching by postponing cartesian products. SIGMOD '16, pages 1199--1214, 2016.
[16]
P. Carbone, A. Katsifodimos, . Kth, S. Sweden, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink: Stream and batch processing in a single engine. 38, 01 2015.
[17]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.
[18]
S. Choudhury, L. B. Holder, G. Chin, K. Agarwal, and J. Feo. A selectivity based approach to continuous pattern detection in streaming graphs. In EDBT, 2015.
[19]
S. Chu, M. Balazinska, and D. Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. SIGMOD '15, pages 63--78.
[20]
F. R. K. Chung, L. Lu, and V. H. Vu. The spectra of random graphs with given expected degrees. Internet Mathematics, 1(3), 2003.
[21]
D. J. DeWitt and J. Gray. Parallel database systems: The future of database processing or a passing fad? SIGMOD Rec., 19(4):104--112.
[22]
P. Erdos and A. Renyi. On the evolution of random graphs. In Publ. Math. Inst. Hungary. Acad. Sci., 1960.
[23]
W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y. Wu. Incremental graph pattern matching. SIGMOD '11, pages 925--936, 2011.
[24]
J. A. Grochow and M. Kellis. Network motif discovery using subgraph enumeration and symmetry-breaking. In Proc. of RECOMB'07, 2007.
[25]
W.-S. Han, J. Lee, and J.-H. Lee. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proc. of SIGMOD'13, 2013.
[26]
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. PVLDB, 8(4):329--340, 2014.
[27]
M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. 2008.
[28]
H. Inoue, M. Ohara, and K. Taura. Faster set intersection with simd instructions by reducing branch mispredictions. PVLDB, 8(3):293--304, 2014.
[29]
Y. E. Ioannidis and Y. C. Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In SIGMOD'91, pages 168--177, 1991.
[30]
C. Kankanamge, S. Sahu, A. Mhedbhi, J. Chen, and S. Salihoglu. Graphflow: An active graph database. SIGMOD '17, pages 1695--1698.
[31]
H. Kim, J. Lee, S. S. Bhowmick, W.-S. Han, J. Lee, S. Ko, and M. H. Jarrah. Dualsim: Parallel subgraph enumeration in a massive graph on a single machine. SIGMOD '16, pages 1231--1245, 2016.
[32]
L. Lai, L. Qin, X. Lin, and L. Chang. Scalable subgraph enumeration in mapreduce. PVLDB, 8(10):974--985, 2015.
[33]
L. Lai, L. Qin, X. Lin, and L. Chang. Scalable subgraph enumeration in mapreduce: A cost-oriented approach. The VLDB Journal, 26(3):421--446, June 2017.
[34]
L. Lai, L. Qin, X. Lin, Y. Zhang, L. Chang, and S. Yang. Scalable distributed subgraph enumeration. PVLDB, 10(3):217--228, 2016.
[35]
B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. WWW, pages 1130--1141, 2019.
[36]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. PVLDB, 5(8):716--727, 2012.
[37]
F. McSherry, M. Isard, and D. G. Murray. Scalability! but at what cost? HOTOS'15, 2015.
[38]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. SOSP '13, pages 439--455, 2013.
[39]
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. J. ACM, 65(3), 2018.
[40]
D. Olteanu and M. Schleich. Factorized databases. SIGMOD Rec., 45(2):5--16, Sept. 2016.
[41]
M. Qiao, H. Zhang, and H. Cheng. Subgraph matching: On compression and computation. PVLDB, 11(2):176--188, 2017.
[42]
X. Qiu, W. Cen, Z. Qian, Y. Peng, Y. Zhang, X. Lin, and J. Zhou. Real-time constrained cycle detection in large dynamic graphs. PVLDB, 11(12):1876--1888.
[43]
X. Ren and J. Wang. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. PVLDB, 8(5):617--628, 2015.
[44]
R. Shamir and D. Tsur. Faster subtree isomorphism. In Proceedings of the Fifth Israeli Symposium on Theory of Computing and Systems, pages 126--131, 1997.
[45]
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008.
[46]
B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. SIGMOD '13, pages 505--516, 2013.
[47]
Y. Shao, B. Cui, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph listing in a large-scale graph. In SIGMOD'14, pages 625--636. ACM, 2014.
[48]
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, 2012.
[49]
T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, 2014.
[50]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. arXiv preprint arXiv:1812.00283, 2018.
[51]
H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. SIGMOD '16, pages 1813--1828.
[52]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In HotCloud'10, pages 10--10.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 10
June 2019
177 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2019
Published in PVLDB Volume 12, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media