skip to main content
research-article

GraMi: frequent subgraph and pattern mining in a single large graph

Published: 01 March 2014 Publication History

Abstract

Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GraMi, a novel framework for frequent subgraph mining in a single large graph. GraMi undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GraMi that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGraMi, a version supporting structural and semantic constraints, and AGraMi, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches.

References

[1]
B. Bringmann. Mining Patterns in Structured Data. PhD thesis, KU Leuven, 2009.
[2]
B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. of PAKDD, pages 858--863, 2008.
[3]
C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. of ICDM, pages 445--450, 2007.
[4]
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In Proc. of ICDE, pages 913--922, 2008.
[5]
Y.-R. Cho and A. Zhang. Predicting protein function by frequent functional association pattern mining in protein interaction networks. Trans. Info. Tech. Biomed., 14(1):30--36, Jan. 2010.
[6]
W.-T. Chu and M.-H. Tsai. Visual pattern discovery for architecture image classification and product image search. In Proc. of ICMR, pages 27:1--27:8, 2012.
[7]
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1(1):231--255, 1994.
[8]
S. de Givry, T. Schiex, and G. Verfaillie. Exploiting tree decomposition and soft local consistency in weighted CSP. In Proc. of AAAI, pages 22--27, 2006.
[9]
M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of ICDM, pages 35--42, 2003.
[10]
A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In Proc. of SIGMOD, pages 431--442, 1999.
[11]
C. Domshlak, R. I. Brafman, and S. E. Shimony. Preference-based configuration of web page content. In Proc. of IJCAI, pages 1451--1456, 2001.
[12]
M. Fiedler and C. Borgelt. Subgraph support in a single large graph. In Proc. of ICDMW, pages 399--404, 2007.
[13]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.
[14]
S. Ghazizadeh and S. S. Chawathe. Seus: Structure extraction using summaries. In Proc. of DS, pages 71--85, 2002.
[15]
V. Guralnik and G. Karypis. A scalable algorithm for clustering sequential data. In Proc. of ICDM, pages 179--186, 2001.
[16]
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. of SIGMOD, pages 405--418, 2008.
[17]
A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD, pages 867--878, 2010.
[18]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.
[19]
M. Kuramochi and G. Karypis. Grew - A scalable frequent subgraph discovery algorithm. In Proc. of ICDM, pages 439--442, 2004.
[20]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.
[21]
J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, Dec. 2012.
[22]
A. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8(1):99--118, 1977.
[23]
J. J. McGregor. Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Sciences, 19: 228--250, 1979.
[24]
S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. of ICDE, pages 844--855, 2009.
[25]
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, May 2012.
[26]
L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3): 10:1--10:42, 2010.
[27]
J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of ACM, 23: 31--42, 1976.
[28]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In Proc. of SIGMOD, pages 433--444, 2008.
[29]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002.
[30]
X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.
[31]
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.
[32]
R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.
[33]
F. Zhu, X. Yan, J. Han, and P. S. Yu. gPrune: A constraint pushing framework for graph pattern mining. In Proc. of PAKDD, pages 388--400, 2007.
[34]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Cited By

View all
  1. GraMi: frequent subgraph and pattern mining in a single large graph

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 7
    March 2014
    108 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 March 2014
    Published in PVLDB Volume 7, Issue 7

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)121
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 02 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media