skip to main content
article

Inexact subgraph isomorphism in MapReduce

Published: 01 February 2013 Publication History

Abstract

Inexact subgraph matching based on type-isomorphism was introduced by Berry et al. [J. Berry, B. Hendrickson, S. Kahan, P. Konecny, Software and algorithms for graph queries on multithreaded architectures, in: Proc. IEEE International Parallel and Distributed Computing Symposium, IEEE, 2007, pp. 1-14] as a generalization of the exact subgraph matching problem. Enumerating small subgraph patterns in very large graphs is a core problem in the analysis of social networks, bioinformatics data sets, and other applications. This paper describes a MapReduce algorithm for subgraph type-isomorphism matching. The MapReduce computing framework is designed for distributed computing on massive data sets, and the new algorithm leverages MapReduce techniques to enable processing of graphs with billions of vertices. The paper also introduces a new class of walk-level constraints for narrowing the set of matches. Constraints meeting criteria defined in the paper are useful for specifying more precise patterns and for improving algorithm performance. Results are provided on a variety of graphs, with size ranging up to billions of vertices and edges, including graphs that follow a power law degree distribution.

References

[1]
Apache Hadoop, Apache Hadoop project, 2011. http://hadoop.apache.org/.
[2]
J. Berry, Practical heuristics for inexact subgraph isomorphism, Technical Report SAND2011-6558W, Sandia National Laboratories, Albuquerque, NM, 2011.
[3]
Berry, J., Hendrickson, B., Kahan, S. and Konecny, P., Software and algorithms for graph queries on multithreaded architectures. In: Proc. IEEE International Parallel and Distributed Computing Symposium, IEEE. pp. 1-14.
[4]
J.W. Berry, D.J. Nordman, C.A. Phillips, A.G. Wilson, Listing triangles in expected linear time on a class of power law graphs, Technical Report SAND2010-4474C, Sandia National Laboratories, Albuquerque, NM, 2010.
[5]
M. Bröcheler, A. Publiese, V. Subrahamian, DOGMA: a disk-oriented graph matching algorithm for RDF databases, in: 8th International Semantic Web Conference, ISWC 2009, 2009, pp. 97-113.
[6]
M. Bröcheler, A. Publiese, V. Subrahamian, COSI: cloud oriented subgraph identification in massive social networks, in: 2010 International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2010, 2010, pp. 248-255.
[7]
Bunke, H. and Allermann, G., Inexact graph matching for structural pattern recognition. Pattern Recognition Letters. 245-253.
[8]
Chakrabarti, D., Faloutsos, C. and Zhan, Y., R-MAT: a recursive model for graph mining. In: SIAM International Conference on Data Mining, SIAM.
[9]
Clueweb Data, ClueWeb09, 2009. http://boston.lti.cs.cmu.edu/clueweb09.
[10]
Coffman, T., Greenblatt, S. and Marcus, S., Graph-based technologies for intelligence analysis. Communications of the ACM. v47. 45-47.
[11]
Cohen, J., Graph twiddling in a MapReduce world. IEEE Computing in Science & Engineering. 29-41.
[12]
Cook, D.J. and Holder, L.B., Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research. v1. 231-255.
[13]
Graph-based data mining. IEEE Intelligent Systems. v15. 32-41.
[14]
Dean, J. and Ghemawat, S., MapReduce: simplified data processing on large clusters. In: OSDI'04: Sixth Symposium on Operating Systems Design and Implementation, USENIX Association.
[15]
Garey, M.R. and Johnson, D.S., Computers and Intractability: A Guide to the Theory of NP-Completeness. 1979. W.H. Freeman & Co.
[16]
Graph 500 Steering Committee, Graph 500 benchmark, 2010. http://www.graph500.org/.
[17]
Han, J. and Kamber, M., Data Mining Concepts and Techniques Second Edition. 2006. Morgan Kaufmann Publishers.
[18]
Integer Hash, Integer hash function, 2011. http://www.concentric.net/~ttwang/tech/inthash.htm.
[19]
Kuramochi, M. and Karypis, G., Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery. v11. 243-271.
[20]
Lemur Toolkit, Lemur toolkit, 2011. http://lemurproject.org/lemur.php.
[21]
Leskovec, J., Chakrabarti, D., Kleinberg, J. and Faloutsos, C., Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In: Knowledge Discovery in Databases: PKDD 2005, vol. 3721. Springer, Berlin, Heidelberg. pp. 133-145.
[22]
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. and Ghahramani, Z., Kronecker graphs: an approach to modeling networks. Journal of Machine Learning Research. v11. 985-1042.
[23]
Liu, Y., Jiang, X., Chen, H., Ma, J. and Zhang, X., MapReduce-based pattern finding algorithm applied in Motif detection for prescription compatibility network. In: Joller, Josef, Dou, Yong, Gruber, Ralf (Eds.), Lecture Notes in Computer Science, vol. 5737. pp. 341-355.
[24]
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P. and Bhattacharjee, B., Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference, ACM.
[25]
Nilsson, N., Principles of Artificial Intelligence. 1980. Tioga Publishing Co.
[26]
Plimpton, S.J. and Devine, K.D., MapReduce in MPI for large-scale graph algorithms. Parallel Computing. v37. 610-632.
[27]
D. Sasha, J.T. Want, R. Guigno, Algorithmics and applications of tree and graph searching, in: Proceedings of the 21st ACM PODS, 2002, pp. 39-52.
[28]
C. Seshadhri, A. Pinar, T.G. Kolda, An in-depth analysis of stochastic Kronecker graphs, in: ICDM 2011: Proceedings of the 2011 IEEE International Conference on Data Mining, 2011, pp. 587-596.
[29]
Tong, H., Gallagher, B., Faloutsos, C. and Eliassi-Rad, T., Fast best-effort pattern matching in large attributed graphs. In: Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. pp. 737-746.
[30]
Ullmann, J., An algorithm for subgraph isomorphism. Journal of the ACM. v23. 31-42.
[31]
White, T., Hadoop: The Definitive Guide. 2010. second ed. O'Reilly Media, Yahoo Press.
[32]
Z. Zhao, G. Wang, A.R. Butt, M. Khan, V.A. Kumar, M.V. Marathe, SAHad: subgraph analysis in massive networks using Hadoop, in: Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012, pp. 390-401.

Cited By

View all
  1. Inexact subgraph isomorphism in MapReduce

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Parallel and Distributed Computing
    Journal of Parallel and Distributed Computing  Volume 73, Issue 2
    February, 2013
    171 pages

    Publisher

    Academic Press, Inc.

    United States

    Publication History

    Published: 01 February 2013

    Author Tags

    1. Graph mining
    2. MapReduce
    3. Pattern match
    4. Subgraph isomorphism

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media