skip to main content
10.1145/3299869.3300072acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation

Published: 25 June 2019 Publication History

Abstract

Visual graph query interfaces (a.k.a gui ) widen the reach of graph querying frameworks across different users by enabling non-programmers to use them. Consequently, several commercial and academic frameworks for querying a large collection of small- or medium-sized data graphs (\textite.g., chemical compounds) provide such visual interfaces. Majority of these interfaces expose a fixed set ofcanned patterns (\textiti.e., small subgraph patterns) to expedite query formulation by enabling pattern-at-a-time in lieu of edge-at-a-time construction mode. Canned patterns to be displayed on a gui are typically selected manually based on domain knowledge. However, manual generation of canned patterns is labour intensive. Furthermore, these patterns may not sufficiently cover the underlying data graphs to expedite visual formulation of a wide range of subgraph queries. In this paper, we present a generic and extensible framework called Catapult to address these limitations. Catapult takes a data-driven approach toautomatically select canned patterns, thereby taking a concrete step towards the vision of data-driven construction of visual query interfaces. Specifically, it firstclusters the underlying data graphs based on their topological similarities and thensummarize each cluster to create acluster summary graph (csg ). The canned patterns within a user-specifiedpattern budget are then generated from these csg s by maximizingcoverage anddiversity, and minimizingcognitive load of the patterns. Experimental study with real-world datasets and visual graph interfaces demonstrates the superiority of Catapult compared to traditional techniques.

References

[1]
A.V. Aho, J.E. Hopcroft, J.E. Ullman. The design and analysis of computer algorithms. Addison-Wesley, 1974.
[2]
S. Arora, E. Hazan, S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 2012.
[3]
D. Arthur, S. Vassilvitskii. How slow is the k-means method?. In SCG, 2006.
[4]
D. Arthur, S. Vassilvitskii. k-means++: The advantages of careful seeding. In SIAM, 2007.
[5]
B. Bahmani,et al. Scalable k-means++. In VLDB, 2012.
[6]
S. S. Bhowmick, B. Choi, C. E. Dyreson. Data-driven visual graph query interface construction and maintenance: challenges and opportunities. PVLDB, 9(12): 984--992, 2016.
[7]
H. Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recogn. Lett., 18(8):689--694, 1997.
[8]
T.Y.S. But, P.H. Toy. The Mitsunobu reaction: origin, mechanism, improvements, and applications. Chem. Asian J., 2(11):1340--1355, 2007.
[9]
Y. Chi, et al. Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst., 8(2), 2005.
[10]
Y. Chi, et al. Indexing and mining free trees. In ICDM, 2003.
[11]
W.G. Cochran. Sampling techniques. Third edition. Wiley, New York, New York, USA.
[12]
C.A.C Coello, G.B. Lamont, D.A. Van Veldhuizen. Evolutionary algorithms for solving multi-objective problems. 2nd Edition, Springer, 2007.
[13]
D. Conte, P. Foggia, M. Vento. Challenging complexity of maximum common subgraph detection algorithms: a performance analysis of three algorithms on a wide database of graphs. J. Graph Algorithms Appl., 11(1):99--143, 2007.
[14]
L.P. Cordella, P. Foggia, C. Sansone. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004.
[15]
L. Laura Faulkner. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments, & Computers, 35(3), 2003.
[16]
F. Geerts, et al. Relational link-based ranking. In VLDB, 2004.
[17]
C. Guestrin, A. Krause, A.P. Singh. Near-optimal sensor placements in gaussian processes. In ICML, 2005.
[18]
S. Günter, H. Bunke. Self-organizing map for clustering in the graph domain. Pattern Recogn. Lett., 23(4):405--417, 2002.
[19]
H. He, A.K. Singh. Closure-tree: An index structure for graph queries. In ICDE, 2006.
[20]
W. Huang, P. Eades, S.H. Hong. Measuring effectiveness of graph visualizations: A cognitive load perspective. Inf. Vis., 8(3): 139--152, 2009.
[21]
K. Jain, V.V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. J. ACM, 48(2):274--296, 2001.
[22]
N. Jiang, Y. Bu, Y. Wang, M. Nie, D. Zhang, X. Zhai. Design, synthesis and structure-activity relationships of novel diaryl urea derivatives as potential EGFR inhibitors. Molecules, 21(11): 1572, 2016.
[23]
H. Kai et al. CATAPULT: data-driven selection of canned patterns for efficient visual graph query formulation. Technical Report. Available at: http://www.ntu.edu.sg/home/assourav/TechReports/catapult-TR. pdf, June 2018.
[24]
S. Khuller, A. Moss, J. Naor. The budgeted maximum coverage problem. Inf. Process. Lett., 70(1): 39--45, 1999.
[25]
S.G. Kobourov, S. Pupyrev, B. Saket. Are crossings important for drawing large graphs? In GD, 2014.
[26]
J. Lazar, J.H. Feng, H. Hochheiser. Research methods in humancomputer interaction. John Wiley & Sons, 2010.
[27]
J.J. McGregor. Backtrack search algorithms and the maximal common subgraph problem. Software: Practice and Experience, 12(1):29--34, 1982.
[28]
M.D. McKay, R.J Beckman,W.J. Conover. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239--245, 1979.
[29]
M. Meila. The uniqueness of a good optimum for k-means. In ICML, 2006.
[30]
S. Nijssen, J.N. Kok. The gaston tool for frequent subgraph mining. Electron. Notes Theor. Comput. Sci., 127(1): 77--87, 2005.
[31]
T. Ramraj, R. Prabhakar. Frequent subgraph mining algorithms - a survey. Procedia Computer Science, 47:197--204, 2015.
[32]
K. Riesen, M. Neuhaus, H. Bunke. Bipartite graph matching for computing the edit distance of graphs. In GbRPR, 2007.
[33]
S. Sakai, M. Togasaki, K. Yamazaki. A note on greedy algorithms for the maximum weighted independent set problem. Discrete Appl. Math. 126(2--3): 313--322, 2003.
[34]
S.E. Schaeffer. Graph clustering. Comput. Sci. Rev., 1(1):27--64, 2007.
[35]
T. Schäfer, P. Mutzel. StruClus: structural clustering of large-scale graph databases. CoRR abs/1609.09000, 2016.
[36]
H. Shang, X. Lin, Y. Zhang, J.X. Yu, W. Wang. Connected substructure similarity search. In SIGMOD, 2010.
[37]
C. Tofallis. Add or multiply? A tutorial on ranking and choosing with multiple criteria. INFORMS Trans. on Education, 14(3): 109--119, 2014.
[38]
H. Toivonen. Sampling large databases for association rules. In VLDB, 1996.
[39]
Y. Chi, R.R. Muntz, S. Nijssen. Frequent subtree mining. In T. Washio, et al., editors, Advances in Mining Graphs, Trees and Sequences, 2005.
[40]
J. Zhang, et al. DaVinci: Data-driven visual interface construction for subgraph search in graph databases. In ICDE, 2015.
[41]
H. Zhang, V. Raj, T. Sellam, E. Wu. Precision interfaces for different modalities. In SIGMOD, 2018.

Cited By

View all

Index Terms

  1. CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
    June 2019
    2106 pages
    ISBN:9781450356435
    DOI:10.1145/3299869
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data-driven construction
    2. graph database
    3. subgraph query
    4. visual query formulation
    5. visual query interface

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '19
    Sponsor:
    SIGMOD/PODS '19: International Conference on Management of Data
    June 30 - July 5, 2019
    Amsterdam, Netherlands

    Acceptance Rates

    SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media