research-article

GraMi: frequent subgraph and pattern mining in a single large graph

Authors:

Mohammed Elseidy,

Ehab Abdelhamid,

Spiros Skiadopoulos,

Panos KalnisAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 7, Issue 7

Pages 517 - 528

https://doi.org/10.14778/2732286.2732289

Published: 01 March 2014 Publication History

Abstract

Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GraMi, a novel framework for frequent subgraph mining in a single large graph. GraMi undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GraMi that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGraMi, a version supporting structural and semantic constraints, and AGraMi, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches.

References

[1]

B. Bringmann. Mining Patterns in Structured Data. PhD thesis, KU Leuven, 2009.

[2]

B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. of PAKDD, pages 858--863, 2008.

Digital Library

[3]

C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. of ICDM, pages 445--450, 2007.

Digital Library

[4]

J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In Proc. of ICDE, pages 913--922, 2008.

Digital Library

[5]

Y.-R. Cho and A. Zhang. Predicting protein function by frequent functional association pattern mining in protein interaction networks. Trans. Info. Tech. Biomed., 14(1):30--36, Jan. 2010.

Digital Library

[6]

W.-T. Chu and M.-H. Tsai. Visual pattern discovery for architecture image classification and product image search. In Proc. of ICMR, pages 27:1--27:8, 2012.

Digital Library

[7]

D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1(1):231--255, 1994.

Digital Library

[8]

S. de Givry, T. Schiex, and G. Verfaillie. Exploiting tree decomposition and soft local consistency in weighted CSP. In Proc. of AAAI, pages 22--27, 2006.

Digital Library

[9]

M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of ICDM, pages 35--42, 2003.

Digital Library

[10]

A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In Proc. of SIGMOD, pages 431--442, 1999.

Digital Library

[11]

C. Domshlak, R. I. Brafman, and S. E. Shimony. Preference-based configuration of web page content. In Proc. of IJCAI, pages 1451--1456, 2001.

Digital Library

[12]

M. Fiedler and C. Borgelt. Subgraph support in a single large graph. In Proc. of ICDMW, pages 399--404, 2007.

Digital Library

[13]

M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.

Digital Library

[14]

S. Ghazizadeh and S. S. Chawathe. Seus: Structure extraction using summaries. In Proc. of DS, pages 71--85, 2002.

Digital Library

[15]

V. Guralnik and G. Karypis. A scalable algorithm for clustering sequential data. In Proc. of ICDM, pages 179--186, 2001.

Digital Library

[16]

H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. of SIGMOD, pages 405--418, 2008.

Digital Library

[17]

A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD, pages 867--878, 2010.

Digital Library

[18]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.

Digital Library

[19]

M. Kuramochi and G. Karypis. Grew - A scalable frequent subgraph discovery algorithm. In Proc. of ICDM, pages 439--442, 2004.

Digital Library

[20]

M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.

Digital Library

[21]

J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, Dec. 2012.

Digital Library

[22]

A. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8(1):99--118, 1977.

Digital Library

[23]

J. J. McGregor. Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Sciences, 19: 228--250, 1979.

[24]

S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. of ICDE, pages 844--855, 2009.

Digital Library

[25]

Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, May 2012.

Digital Library

[26]

L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3): 10:1--10:42, 2010.

Digital Library

[27]

J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of ACM, 23: 31--42, 1976.

Digital Library

[28]

X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In Proc. of SIGMOD, pages 433--444, 2008.

Digital Library

[29]

X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002.

Digital Library

[30]

X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.

Digital Library

[31]

X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.

Digital Library

[32]

R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.

[33]

F. Zhu, X. Yan, J. Han, and P. S. Yu. gPrune: A constraint pushing framework for graph pattern mining. In Proc. of PAKDD, pages 388--400, 2007.

Digital Library

[34]

L. Zou, L. Chen, and M. T. Özsu. Distance-join: pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Digital Library

Cited By

Liu HChen WTang JDeng MGuo YTang Z(2025)Revealing Urban Spatial Interaction Characteristics and Crowd Travel Patterns from Trajectory DataAnnals of the American Association of Geographers10.1080/24694452.2024.2440409(1-19)Online publication date: 7-Jan-2025
https://doi.org/10.1080/24694452.2024.2440409
Dias VGuedes D(2024)Graph Pattern Mining: consolidating models, systems, and abstractionsAnais Estendidos do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2024)10.5753/sbbd_estendido.2024.240515(190-195)Online publication date: 14-Oct-2024
https://doi.org/10.5753/sbbd_estendido.2024.240515
Wang RSun JTian CDuan Z(2024)Meta-Interpretive LEarning with ReuseMathematics10.3390/math1206091612:6(916)Online publication date: 20-Mar-2024
https://doi.org/10.3390/math12060916
Show More Cited By

GraMi: frequent subgraph and pattern mining in a single large graph
1. Information systems
  1. Information systems applications

Recommendations

On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs

A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 7, Issue 7

March 2014

108 pages

ISSN:2150-8097

Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2014

Published in PVLDB Volume 7, Issue 7

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

217
Total Citations
View Citations
1,224
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu HChen WTang JDeng MGuo YTang Z(2025)Revealing Urban Spatial Interaction Characteristics and Crowd Travel Patterns from Trajectory DataAnnals of the American Association of Geographers10.1080/24694452.2024.2440409(1-19)Online publication date: 7-Jan-2025
https://doi.org/10.1080/24694452.2024.2440409
Dias VGuedes D(2024)Graph Pattern Mining: consolidating models, systems, and abstractionsAnais Estendidos do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2024)10.5753/sbbd_estendido.2024.240515(190-195)Online publication date: 14-Oct-2024
https://doi.org/10.5753/sbbd_estendido.2024.240515
Wang RSun JTian CDuan Z(2024)Meta-Interpretive LEarning with ReuseMathematics10.3390/math1206091612:6(916)Online publication date: 20-Mar-2024
https://doi.org/10.3390/math12060916
Han AYuan WYuan WZhou JJian XWang RGao X(2024)Mining Spatial-Temporal Frequent Patterns of Natural Disasters in China Based on Textual RecordsInformation10.3390/info1507037215:7(372)Online publication date: 27-Jun-2024
https://doi.org/10.3390/info15070372
Lee HShin BChoi DLim JBok KYoo J(2024)Graph Stream Compression Scheme Based on Pattern Dictionary Using ProvenanceApplied Sciences10.3390/app1411455314:11(4553)Online publication date: 25-May-2024
https://doi.org/10.3390/app14114553
Leng FLi FBao YZhang TYu G(2024)FSM-BC-BSP: Frequent Subgraph Mining Algorithm Based on BC-BSPApplied Sciences10.3390/app1408315414:8(3154)Online publication date: 9-Apr-2024
https://doi.org/10.3390/app14083154
Shiokawa HNaoi YMatsugu SLarson K(2024)Efficient correlated subgraph searches for ai-powered drug discoveryProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/260(2352-2361)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/260
Fan WPang KLu PTian C(2024)Making It Tractable to Detect and Correct Errors in GraphsACM Transactions on Database Systems10.1145/370231549:4(1-75)Online publication date: 16-Dec-2024
https://dl.acm.org/doi/10.1145/3702315
Yao YHou JWu GCheng YYuan MLuo PWang ZLi X(2024)SecoInfer: Secure DNN End-Edge Collaborative Inference Framework Optimizing Privacy and LatencyACM Transactions on Sensor Networks10.1145/369497220:6(1-29)Online publication date: 23-Nov-2024
https://dl.acm.org/doi/10.1145/3694972
Sharma AMehta DWu B(2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3661304.3661897
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents