skip to main content
research-article

Graph pattern matching: from intractable to polynomial time

Published: 01 September 2010 Publication History

Abstract

Graph pattern matching is typically defined in terms of subgraph isomorphism, which makes it an np-complete problem. Moreover, it requires bijective functions, which are often too restrictive to characterize patterns in emerging applications. We propose a class of graph patterns, in which an edge denotes the connectivity in a data graph within a predefined number of hops. In addition, we define matching based on a notion of bounded simulation, an extension of graph simulation. We show that with this revision, graph pattern matching can be performed in cubic-time, by providing such an algorithm. We also develop algorithms for incrementally finding matches when data graphs are updated, with performance guarantees for dag patterns. We experimentally verify that these algorithms scale well, and that the revised notion of graph pattern matching allows us to identify communities commonly found in real-world networks.

References

[1]
S. Amer-Yahia, M. Benedikt, and P. Bohannon. Challenges in searching online communities. IEEE Data Eng. Bull., 30(2):23--31, 2007.
[2]
J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications. Springer, 2008.
[3]
T. Y. Berger-Wolf and J. Saia. A framework for analysis of dynamic social networks. In KDD, 2006.
[4]
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, 2002.
[5]
E. P. Chan and H. Lim. Optimization and evaluation of shortest path queries. VLDB J., 16(3):343--369, 2007.
[6]
L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In VLDB, 2005.
[7]
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, 2008.
[8]
J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In EDBT, 2008.
[9]
J. Cho, N. Shivakumar, and H. Garcia-Molina. Finding replicated Web collections. SIGMOD Rec., 29(2), 2000.
[10]
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SICOMP, 32(5), 2003.
[11]
W. Fan and P. Bohannon. Information preserving XML schema embedding. TODS, 33(1), 2008.
[12]
W. Fan, J. Li, S. Ma, H. Wang, and Y. Wu. Graph homomorphism revisited for graph matching. PVLDB, 3, 2010.
[13]
B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS., 2006.
[14]
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, 2009.
[15]
M. R. Henzinger, T. Henzinger, and P. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995.
[16]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009.
[17]
T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999.
[18]
L. D. Nardo, F. Ranzato, and F. Tapparo. The subgraph similarity problem. TKDE, 21(5):748--749, 2009.
[19]
M. Natarajan. Understanding the structure of a drug trafficking organization: a conversational analysis. Crime Prevention Studies, 11:273--298, 2000.
[20]
G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path problem. J. Algorithms, 21(2):267--305, 1996.
[21]
G. Ramalingam and T. Reps. On the computational complexity of dynamic graph problems. TCS, 158(1--2), 1996.
[22]
G. Ramalingam and T. W. Reps. A categorized bibliography on incremental computation. In POPL, 1993.
[23]
R. Ronen and O. Shmueli. SoQL: A language for querying and creating data in social networks. In ICDE, 2009.
[24]
D. Saha. An incremental bisimulation algorithm. In FSTTCS, 2007.
[25]
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, 2002.
[26]
L. Terveen and D. W. McDonald. Social matching: A framework and research agenda. ACM Trans. Comput.-Hum. Interact., 12(3), 2005.
[27]
H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast best-effort pattern matching in large attributed graphs. In KDD, 2007.
[28]
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1), 1976.
[29]
H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE, 2006.
[30]
K. Yi, H. He, I. Stanoi, and J. Yang. Incremental maintenance of XML structural indexes. In SIGMOD, 2004.
[31]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. In VLDB, 2009.

Cited By

View all
  1. Graph pattern matching: from intractable to polynomial time

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
      September 2010
      1658 pages

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 September 2010
      Published in PVLDB Volume 3, Issue 1-2

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)43
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media