skip to main content
article
Free access

Scalable subgraph enumeration in MapReduce: a cost-oriented approach

Published: 01 June 2017 Publication History

Abstract

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent researches focus on solving the problem using MapReduce. Nevertheless, exiting MapReduce approaches are not scalable to handle very large graphs since they either produce a huge number of partial results or consume a large amount of memory. Motivated by this, in this paper, we propose a new algorithm $$\mathsf {Twin}$$Twin$$\mathsf {Twig}$$Twig$$\mathsf {Join}$$Join based on a left-deep-join framework in MapReduce, in which the basic join unit is a $$\mathsf {Twin}$$Twin$$\mathsf {Twig}$$Twig (an edge or two incident edges of a node). We show that in the Erdös---Rényi random graph model, $$\mathsf {Twin}$$Twin$$\mathsf {Twig}$$Twig$$\mathsf {Join}$$Join is instance optimal in the left-deep-join framework under reasonable assumptions, and we devise an algorithm to compute the optimal join plan. We further discuss how our approach can be adapted to handle the power-law random graph model. Three optimization strategies are explored to improve our algorithm. Ultimately, by aggregating equivalent nodes into a compressed node, we construct the compressed graph, upon which the subgraph enumeration is further improved. We conduct extensive performance studies in several real graphs, one of which contains billions of edges. Our approach significantly outperforms existing solutions in all tests.

References

[1]
Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: Proceedings of the ICDE'13 (2013)
[2]
Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N., Willke, T.L.: Graphlet Decomposition: Framework, Algorithms, and Applications. ArXiv e-prints (2015)
[3]
Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the STOC'00 (2000)
[4]
Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C.: Biomolecular network motif counting and discovery by color coding. In: Proceedings of the ISMB'08 (2008)
[5]
Bhuiyan, M.A., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. TKDE 27(3), 608---620 (2015)
[6]
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422---426 (1970)
[7]
Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1), 210---223 (1985)
[8]
Chung, F.R.K., Lu, L., Vu, V.H.: The spectra of random graphs with given expected degrees. Internet Math. 1(3), 6313---6318 (2003)
[9]
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661---703 (2009)
[10]
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the OSDI'04 (2004)
[11]
Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 38(4), 343---347 (1960)
[12]
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264---275 (2010)
[13]
Gonen, M., Ron, D., Shavitt, Y.: Counting stars and other small subgraphs in sublinear time. In: Proceedings of the SODA'10 (2010)
[14]
Grochow, J.A., Kellis, M.: Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the RECOMB'07 (2007)
[15]
Gonzalez, J., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the OSDI'12 (2012)
[16]
Han, W.S., Lee, J., Lee, J.H.: Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the SIGMOD'13 (2013)
[17]
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the SIGMOD'08 (2008)
[18]
Kairam, S.R., Wang, D.J., Leskovec, J.: The life and death of online groups: predicting group growth and longevity. In: Proceedings of the WSDM'12 (2012)
[19]
Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. PVLDB 6(3), 181---190 (2013)
[20]
Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974---985 (2015)
[21]
Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133---144 (2012)
[22]
Leskovec, J., Singh, A., Kleinberg, J.: Patterns of influence in a recommendation network. In: Proceedings of the PAKDD'06 (2006)
[23]
Lin, W., Xiao, X., Gabriel, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844---855 (2014)
[24]
Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW (2012)
[25]
Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 257---273 (2008)
[26]
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824---827 (2002)
[27]
Plantenga, T.: Inexact subgraph isomorphism in mapreduce. J. Parallel Distrib. Comput. 73(2), 164---175 (2013)
[28]
Przulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), 177---183 (2007)
[29]
Rahman, M., Bhuiyan, M.A., Hasan, M.A.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466---2478 (2014)
[30]
Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8(5), 617---628 (2015)
[31]
Rücker, G., Rücker, C.: Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules. J. Chem. Info. Comput. Sci. 41(6), 1457---1462 (2001)
[32]
Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: AISTATS (2009)
[33]
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364---375 (2008)
[34]
Steinbrunn, M., Moerkotte, G., Kemper, A.: Optimizing Join Orders. Tech. rep. (1993)
[35]
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788---789 (2012)
[36]
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the WWW'11 (2011)
[37]
Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: Counting triangles in massive graphs with a coin. In: Proceedings of the KDD'09 (2009)
[38]
Viger, F., Latapy, M.: Efficient and simple generation of random simple connected graphs with prescribed degree sequence. In: COCOON'05, pp. 440---449 (2005)
[39]
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812---823 (2012)
[40]
Watts, D., Strogatz, S.: Collective dynamics of 'small-world' networks. Nature 6684(393), 440---442 (1998)
[41]
Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1---2), 340---451 (2010)
[42]
Zhao, Z., Khan, M., Kumar, V.S.A., Marathe, M.V.: Subgraph enumeration in large social contact networks using parallel color coding and streaming. In: Proceedings of the ICPP'10 (2010)

Cited By

View all
  1. Scalable subgraph enumeration in MapReduce: a cost-oriented approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image The VLDB Journal — The International Journal on Very Large Data Bases
    The VLDB Journal — The International Journal on Very Large Data Bases  Volume 26, Issue 3
    June 2017
    163 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 June 2017

    Author Tags

    1. MapReduce
    2. Power-law graph
    3. Random graph
    4. Subgraph enumeration

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media